-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
yaml writer slows down execution #2499
Comments
Thanks for the thorough report @ChrisBNEU. I think a short term solution would be to add a way to optionally disable this writer from the input, but long term would be to use multiprocessing to spawn a task that runs the yaml writing parallel to the simulation itself. We could also look at a different output format that writes faster, like JSON. |
I found this an interesting read. (One of those stack overflow posts where the best answer is in fact the one with the fewest upvotes). "How is it that json serialization is so much faster than yaml serialization in Python?" |
I think at least a significant part of this is due to the fact that we're generating smiles twice for every species when generating the yaml: https://github.com/ReactionMechanismGenerator/RMG-Py/blob/ef83a1c0e33854aaf749484755cc02a943c1ab01/rmgpy/yml.py#L101C5-L101C5 and https://github.com/ReactionMechanismGenerator/RMG-Py/blob/ef83a1c0e33854aaf749484755cc02a943c1ab01/rmgpy/yml.py#L115C58-L115C58. I think if we use Molecule.smiles instead this will speed up yaml generation quite significantly. |
I linked a pr for this, but it is the temporary fix we mentioned. I just made an input file arg. Should it be linked to this issue or do we want to leave it open/move it to a discussion after it's merged? |
This should speed up execution time by accessing the property each time instead of regenerating it, see : ReactionMechanismGenerator#2499 (comment)
This should speed up execution time by accessing the property each time instead of regenerating it, see : ReactionMechanismGenerator#2499 (comment)
I profiled the same run, with the updated code in yml.py that Jackson added. I think the use of |
Okay, I took a little bit closer look at the logs. The edge in this ethane pyrolysis system is incredibly small (since ethane is a small molecule) with edge core ratios of only ~3 for species and ~1.5 for reactions. I think this significantly reduces the reaction generation costs which is putting them on the same order as the simulation and the yaml file generation as this stage in mechanism generation. I'm still surprised to see yaml still costs us so much to generate, but my suspicion is that this won't matter in performance critical runs as simulation costs and reaction generation costs will be much higher in those cases while the yaml generation will scale linearly with core size. I wonder if there's a clever way instead of generating an entirely new yaml just copy the last yaml file and "append" only the new species/reactions. |
We can cache the results of |
after calling `obj_to_dict` with a species once, the result is saved in a lookup dictionary so that subsequent calls are 'instant'. This costs memory but for an expensive function like this that inherently ends up calling on the same input a lot, it should be worth it. related comment: ReactionMechanismGenerator#2499 (comment)
This should speed up execution time by accessing the property each time instead of regenerating it, see : ReactionMechanismGenerator#2499 (comment)
This should speed up execution time by accessing the property each time instead of regenerating it, see : ReactionMechanismGenerator#2499 (comment)
This issue is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant issue, otherwise it will automatically be closed in 30 days. |
Given the significance of the slowdown and the open PR which (nearly) fixes it, I'm going to label this as a bug. |
This should speed up execution time by accessing the property each time instead of regenerating it, see : ReactionMechanismGenerator#2499 (comment)
This should speed up execution time by accessing the property each time instead of regenerating it, see : ReactionMechanismGenerator#2499 (comment)
This should speed up execution time by accessing the property each time instead of regenerating it, see : ReactionMechanismGenerator#2499 (comment)
Bug Description
It appears that the yaml writer uses a lot of overhead during the execution of an rmg run. I ran a modified ethane pyrolysis input file with cProfile. The only thing I changed in rmg was I did not write a yaml file for rms every execution. I did that by commenting out this line in main.py:
the difference in execution times were:
I attached the profiles and the rmg logs below. It is possible this becomes insignificant at longer execution times, I have not tried it. The Profiler graph shows it executing every time the model is enlarged, so it is also possible it gets worse with larger mechanisms.
How To Reproduce
Comment out the line specified above, then run the profiler on the input file I attached below (
python-jl rmg.py -p input.py
)Installation Information
Describe your installation method and system information.
Additional Context
input file used:
input.py.zip
profiles:
profile_with_listener.pdf
profile_without_listener.pdf
rmg logs:
with_listener.log
without_listener.log
The text was updated successfully, but these errors were encountered: