Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory usage #40

Open
plattsad opened this issue Mar 2, 2021 · 2 comments
Open

Memory usage #40

plattsad opened this issue Mar 2, 2021 · 2 comments

Comments

@plattsad
Copy link

plattsad commented Mar 2, 2021

Hi all,

Is there a way to cut back the memory usage of hal2vg a bit? I have a 9.7GBase hal from a 32 way alignment of small to moderate (150-800MBase) genomes. Memory usage during the conversion process seems to gradually increase and after 4 hours I'm up to the pinching of the 4th leaf node genome and we're heading past 280GB real RAM used. I'd hoped this would complete on a 400GB server, but now I'm having doubts as to whether it will complete on a 1TB server.

Command line is below - I've tried with different chop values (32,10000) to see if this changed anything in the memory usage - it doesn't seem to.

./hal2vg ./P32Out.hal --hdf5InMemory --chop 10000 --noAncestors --progress > p32.pg

Thanks!

@glennhickey
Copy link
Collaborator

The time and memory per genome should decrease for each successive genome, for whatever that's worth.

In general, I've mostly tested this with alignments between human genomes where the memory usage is bad, but much better than what you're describing (~1 hour / couple hundred gigs for nearly 100 human chr1's). I'd like to make it more efficient, but don't have any immediate plans.

What kind of data are you running on? If you have a bunch of very diverse genomes, it won't only take a ton of RAM, but your output graph will be so fragmented that I'm not sure what it could be used for.

@plattsad
Copy link
Author

plattsad commented Mar 2, 2021

Thanks Glenn. Yup, maybe I was just being too optimistic here - even the maf export is pretty fragmented. This is a an alignment across plants with maybe a 100MY mrca. So probably too much divergence to be useful. I killed the process as it passed 330GB and will look to focus more on lineages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants