-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark examples with codspeed #443
Conversation
I think this needs an admin on the egraphs-good org to approve the app in codspeed, I had to "request" adding it to this org. I also would need admin permissions on this repo to add the |
In response to some questions offline on any downsides to adding codspeed: It introduces some more complexity overall in terms of depending on external services, and will add some more noise to PRs about their performance impacts. It may also increase the time it takes to run the CI, from maybe 2 mins to 4 mins? If we find this to be too detrimental we can exclude some of our longer running examples from benchmarking. Performance is also not an exact science, so it has the possibility of introducing some false assurances, when in fact the impact might be different when running in a different environment. Also, it means that we are judging performance based on the example files which might not representative of performance sensitive workloads. Overall though I would judge the additional signal to be worth those, in helping us judge the impact of PRs on performance, especially those meant to make things faster. I think it will also encourage us to have representative example files. I have found codspeed to be invaluable in the Python package to help diagnose the impact or large changes on end to end performance and it has helped me refine those changes so that the impacts are reasonable. It's built in flamegraphs have also helped me narrow down where the slowdowns occur and also see upstream slowdowns from egglog changes. So adding this instrumentation closer to the source would be helpful in getting that information sooner. |
CodSpeed Performance ReportCongrats! CodSpeed is installed 🎉
You will start to see performance impacts in the reports once the benchmarks are run from your default branch.
|
@@ -1293,7 +1293,6 @@ impl EGraph { | |||
filename.push(file.as_str()); | |||
// append to file | |||
let mut f = File::options() | |||
.write(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit update from rust upgrade
stack: &mut Vec<Value>, | ||
stack: &mut [Value], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit update from rust upgrade
channel = "1.74.0" | ||
channel = "1.79.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had to update toolchain codspeed dep
egraph.fact_directory = args.fact_directory.clone(); | ||
egraph.fact_directory.clone_from(&args.fact_directory); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit update from rust upgrade
.map(|s| s.clone()) | ||
.cloned() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit update from rust upgrade
This is ready for a review now. There are lots of tradeoffs with how accurate we want benchmarks to be vs their overall time, but I tried to find a good balance for now. Currently, the benchmarks CI task takes ~ 9 minutes to run. Why does it take so long? Codspeed instruments the code to measure CPU cycles to get more reliable benchmarks on flaky hardware like that in CI:
In order to keep the timings reasonable, I reduce the number of times the They look similar enough to probably have reasonable differences in performances based on the shorter one. Here is a rough outline of where time is spent in the benchmarks currently: I hope that even if this isn't a perfect representation of production tasks, it can still be helpful as we iterate over PRs. If your PR is minor, you can always merge it in without waiting for the benchmarks to finish if you don't want to wait. |
09f9730
to
289e457
Compare
I had increased the "threshold" for codspeed to say whether the performance change was meaningful to 10%. I brought it back down to 5% so it can alert us to more changes: I think the main thing it affects is the comments on PRs. If you click on the report, you can see a granular breakdown of all changes. |
@saulshanabrook Are you sure about this last change? The website says that the default is 10%, and 10% seems more in line with the variance that we're seeing. |
Yeah I'm open to that! It just seems like a 5% speedup on the long running. Benchmarks is actually rather significant... And if we are trying to improve them then we might need many smaller performance improvements to get there. I don't see much variability at all on the larger benchmarks. |
This PR adds benchmarking to CI by running all of our example files with codspeed.
This will comment on new PRs and say how much they affect the performance of running the example files.
Hopefully, it should not only help us see how PRs affect performance but also give us some profiling through codspeed to diagnose where things speed up or slow down.