yaml writer slows down execution #2499

ChrisBNEU · 2023-07-10T03:36:04Z

Bug Description

It appears that the yaml writer uses a lot of overhead during the execution of an rmg run. I ran a modified ethane pyrolysis input file with cProfile. The only thing I changed in rmg was I did not write a yaml file for rms every execution. I did that by commenting out this line in main.py:

self.attach(RMSWriter(self.output_directory))

the difference in execution times were:

with RMSWriter listener: 15 min 15 sec
without RMSWriter listener: 7min 35 sec

I attached the profiles and the rmg logs below. It is possible this becomes insignificant at longer execution times, I have not tried it. The Profiler graph shows it executing every time the model is enlarged, so it is also possible it gets worse with larger mechanisms.

How To Reproduce

Comment out the line specified above, then run the profiler on the input file I attached below (python-jl rmg.py -p input.py)

Installation Information

Describe your installation method and system information.

macOS 10.15.7
Installed from source via conda
RMG version information:
- RMG-Py: 3.0.0-1437-gef83a1c0e
- RMG-database: 3.1.0-616-g75fabcb6c

Additional Context

input file used:
input.py.zip

profiles:
profile_with_listener.pdf
profile_without_listener.pdf

rmg logs:
with_listener.log
without_listener.log

The text was updated successfully, but these errors were encountered:

JacksonBurns · 2023-07-10T12:15:19Z

Thanks for the thorough report @ChrisBNEU. I think a short term solution would be to add a way to optionally disable this writer from the input, but long term would be to use multiprocessing to spawn a task that runs the yaml writing parallel to the simulation itself. We could also look at a different output format that writes faster, like JSON.

rwest · 2023-07-10T13:44:10Z

I found this an interesting read. (One of those stack overflow posts where the best answer is in fact the one with the fewest upvotes). "How is it that json serialization is so much faster than yaml serialization in Python?"
I'm sure we could make it much faster with a little effort. Especially as we're outputting the same few things every time and could hard-code the templates. Worth looking into as we switch to Cantera yaml output (#2078)

mjohnson541 · 2023-07-10T13:49:56Z

I think at least a significant part of this is due to the fact that we're generating smiles twice for every species when generating the yaml: https://github.com/ReactionMechanismGenerator/RMG-Py/blob/ef83a1c0e33854aaf749484755cc02a943c1ab01/rmgpy/yml.py#L101C5-L101C5 and https://github.com/ReactionMechanismGenerator/RMG-Py/blob/ef83a1c0e33854aaf749484755cc02a943c1ab01/rmgpy/yml.py#L115C58-L115C58.

I think if we use Molecule.smiles instead this will speed up yaml generation quite significantly.

ChrisBNEU · 2023-07-26T18:00:56Z

I linked a pr for this, but it is the temporary fix we mentioned. I just made an input file arg. Should it be linked to this issue or do we want to leave it open/move it to a discussion after it's merged?

This should speed up execution time by accessing the property each time instead of regenerating it, see : ReactionMechanismGenerator#2499 (comment)

JacksonBurns · 2023-07-26T18:29:29Z

I linked a pr for this, but it is the temporary fix we mentioned. I just made an input file arg. Should it be linked to this issue or do we want to leave it open/move it to a discussion after it's merged?

I think with a191fbd we can allow #2508 to close this (assuming that the patch works)

This should speed up execution time by accessing the property each time instead of regenerating it, see : ReactionMechanismGenerator#2499 (comment)

ChrisBNEU · 2023-07-27T19:09:37Z

I profiled the same run, with the updated code in yml.py that Jackson added. I think the use of molecule.smiles change helps, the % of the overhead time used by write_yml went from 46% to 30%. Unfortunately, it still appears that the yaml write takes a lot longer than writing 1 file at the end. I included the profiles and logs if people are curious. I think with #2508 we are covered, if people want it to run quicker they can just disable the yaml write for now, and people can refer back to this issue if they want to make the yaml write faster.

Profile_with_yaml_each_iter.pdf
profile_yaml_at_end.pdf

log_with_yaml_each_iter.log
log_yaml_at_end.log

mjohnson541 · 2023-07-27T19:31:59Z

Okay, I took a little bit closer look at the logs. The edge in this ethane pyrolysis system is incredibly small (since ethane is a small molecule) with edge core ratios of only ~3 for species and ~1.5 for reactions. I think this significantly reduces the reaction generation costs which is putting them on the same order as the simulation and the yaml file generation as this stage in mechanism generation.

I'm still surprised to see yaml still costs us so much to generate, but my suspicion is that this won't matter in performance critical runs as simulation costs and reaction generation costs will be much higher in those cases while the yaml generation will scale linearly with core size. I wonder if there's a clever way instead of generating an entirely new yaml just copy the last yaml file and "append" only the new species/reactions.

JacksonBurns · 2023-07-27T19:39:34Z

We can cache the results of obj_to_dict which should speed things up - going to make another quick commit.

after calling `obj_to_dict` with a species once, the result is saved in a lookup dictionary so that subsequent calls are 'instant'. This costs memory but for an expensive function like this that inherently ends up calling on the same input a lot, it should be worth it. related comment: ReactionMechanismGenerator#2499 (comment)

This should speed up execution time by accessing the property each time instead of regenerating it, see : ReactionMechanismGenerator#2499 (comment)

github-actions · 2023-10-26T08:07:08Z

This issue is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant issue, otherwise it will automatically be closed in 30 days.

JacksonBurns · 2023-10-26T13:32:48Z

Given the significance of the slowdown and the open PR which (nearly) fixes it, I'm going to label this as a bug.

This should speed up execution time by accessing the property each time instead of regenerating it, see : ReactionMechanismGenerator#2499 (comment)

ChrisBNEU linked a pull request Jul 26, 2023 that will close this issue

Allow disabling of RMS yaml writer for speedup #2508

Open

ChrisBNEU self-assigned this Jul 26, 2023

ChrisBNEU linked a pull request Jul 26, 2023 that will close this issue

Allow disabling of RMS yaml writer for speedup #2508

Open

github-actions bot added the stale stale issue/PR as determined by actions bot label Oct 26, 2023

JacksonBurns added bug bug which will never be closed by the actions bot and removed stale stale issue/PR as determined by actions bot labels Oct 26, 2023

rwest mentioned this issue Mar 15, 2024

Native YAML writer #2633

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

yaml writer slows down execution #2499

yaml writer slows down execution #2499

ChrisBNEU commented Jul 10, 2023

JacksonBurns commented Jul 10, 2023

rwest commented Jul 10, 2023

mjohnson541 commented Jul 10, 2023

ChrisBNEU commented Jul 26, 2023

JacksonBurns commented Jul 26, 2023

ChrisBNEU commented Jul 27, 2023

mjohnson541 commented Jul 27, 2023

JacksonBurns commented Jul 27, 2023

github-actions bot commented Oct 26, 2023

JacksonBurns commented Oct 26, 2023

yaml writer slows down execution #2499

yaml writer slows down execution #2499

Comments

ChrisBNEU commented Jul 10, 2023

Bug Description

How To Reproduce

Installation Information

Additional Context

JacksonBurns commented Jul 10, 2023

rwest commented Jul 10, 2023

mjohnson541 commented Jul 10, 2023

ChrisBNEU commented Jul 26, 2023

JacksonBurns commented Jul 26, 2023

ChrisBNEU commented Jul 27, 2023

mjohnson541 commented Jul 27, 2023

JacksonBurns commented Jul 27, 2023

github-actions bot commented Oct 26, 2023

JacksonBurns commented Oct 26, 2023