Benchmark examples with codspeed #443

saulshanabrook · 2024-10-13T01:07:02Z

This PR adds benchmarking to CI by running all of our example files with codspeed.

This will comment on new PRs and say how much they affect the performance of running the example files.

Hopefully, it should not only help us see how PRs affect performance but also give us some profiling through codspeed to diagnose where things speed up or slow down.

saulshanabrook · 2024-10-13T01:31:13Z

I think this needs an admin on the egraphs-good org to approve the app in codspeed, I had to "request" adding it to this org.

I also would need admin permissions on this repo to add the CODSPEED_TOKEN secret, or someone else who already has them can create one and add them here!

saulshanabrook · 2024-10-14T14:52:51Z

In response to some questions offline on any downsides to adding codspeed:

It introduces some more complexity overall in terms of depending on external services, and will add some more noise to PRs about their performance impacts. It may also increase the time it takes to run the CI, from maybe 2 mins to 4 mins? If we find this to be too detrimental we can exclude some of our longer running examples from benchmarking.

Performance is also not an exact science, so it has the possibility of introducing some false assurances, when in fact the impact might be different when running in a different environment. Also, it means that we are judging performance based on the example files which might not representative of performance sensitive workloads.

Overall though I would judge the additional signal to be worth those, in helping us judge the impact of PRs on performance, especially those meant to make things faster. I think it will also encourage us to have representative example files.

I have found codspeed to be invaluable in the Python package to help diagnose the impact or large changes on end to end performance and it has helped me refine those changes so that the impacts are reasonable. It's built in flamegraphs have also helped me narrow down where the slowdowns occur and also see upstream slowdowns from egglog changes. So adding this instrumentation closer to the source would be helpful in getting that information sooner.

codspeed-hq · 2024-10-15T17:23:46Z

CodSpeed Performance Report

Congrats! CodSpeed is installed 🎉

🆕 87 new benchmarks were detected.

You will start to see performance impacts in the reports once the benchmarks are run from your default branch.

Detected benchmarks

antiunify (2.1 ms)
array (26.8 ms)
bdd (15.2 ms)
before-proofs (1.6 ms)
birewrite (1.3 ms)
bitwise (678.3 µs)
bool (1.4 ms)
calc (5.4 ms)
combinators (18.4 ms)
combined-nested (1 ms)
container-rebuild (2 ms)
cyk (11.2 ms)
cykjson (338.4 ms)
datatypes (582.1 µs)
delete (642.5 µs)
eggcc-extraction (5.5 s)
eqsat-basic (1.6 ms)
eqsolve (31.1 ms)
f64 (935.7 µs)
fail_wrong_assertion (1.3 ms)
fibonacci (1.6 ms)
fibonacci-demand (2.1 ms)
fusion (43.7 ms)
herbie (286.2 ms)
herbie-tutorial (12.7 ms)
i64 (350.2 µs)
include (1.2 ms)
integer_math (12.2 ms)
intersection (1.9 ms)
interval (2.7 ms)
knapsack (5.7 ms)
lambda (145 ms)
levenshtein-distance (15.1 ms)
list (4.8 ms)
map (650.5 µs)
math (36.7 ms)
math-microbenchmark (4.2 s)
matrix (11.5 ms)
merge-during-rebuild (1 ms)
merge-saturates (3 ms)
name-resolution (1.2 ms)
path (1 ms)
path-union (1.1 ms)
pathproof (1.5 ms)
points-to (2 ms)
primitives (523.4 µs)
prims (5.4 ms)
push-pop (648 µs)
rational (890.2 µs)
repro-define (693.2 µs)
repro-desugar-143 (8.6 ms)
repro-empty-query (618.4 µs)
repro-equal-constant (670 µs)
repro-equal-constant2 (654.4 µs)
repro-noteqbug (745.2 µs)
repro-primitive-query (681.2 µs)
repro-querybug (929.3 µs)
repro-querybug2 (669.1 µs)
repro-querybug3 (2 ms)
repro-querybug4 (711.8 µs)
repro-should-saturate (650.2 µs)
repro-silly-panic (918.8 µs)
repro-typechecking-schedule (462.1 µs)
repro-unsound (269.3 ms)
repro-unsound-htutorial (847.3 µs)
repro-vec-unequal (768.6 µs)
resolution (4.1 ms)
rw-analysis (39.8 ms)
schedule-demo (2.2 ms)
semi_naive_set_function (41.5 ms)
set (2.7 ms)
stratified (901.7 µs)
string (539.3 µs)
string_quotes (477.1 µs)
subsume (1.5 ms)
test-combined (1.3 ms)
test-combined-steps (2.9 ms)
towers-of-hanoi (5.3 ms)
tricky-type-checking (11.6 ms)
type-constraints-tests (578.4 µs)
typecheck (5.8 ms)
typeinfer (400.7 ms)
unification-points-to (7.8 ms)
unify (1.1 ms)
unstable-fn (5.7 ms)
until (2.7 ms)
vec (975.2 µs)

e5437ac

saulshanabrook · 2024-10-15T18:51:40Z

src/lib.rs

@@ -1293,7 +1293,6 @@ impl EGraph {
                filename.push(file.as_str());
                // append to file
                let mut f = File::options()
-                    .write(true)


nit update from rust upgrade

saulshanabrook · 2024-10-15T18:51:50Z

src/actions.rs

-        stack: &mut Vec<Value>,
+        stack: &mut [Value],


nit update from rust upgrade

saulshanabrook · 2024-10-15T18:52:06Z

rust-toolchain.toml

-channel = "1.74.0"
+channel = "1.79.0"


had to update toolchain codspeed dep

saulshanabrook · 2024-10-15T18:52:20Z

src/main.rs

-        egraph.fact_directory = args.fact_directory.clone();
+        egraph.fact_directory.clone_from(&args.fact_directory);


nit update from rust upgrade

saulshanabrook · 2024-10-15T18:52:28Z

src/sort/fn.rs

-                                .map(|s| s.clone())
+                                .cloned()


nit update from rust upgrade

91d8142

saulshanabrook · 2024-10-15T21:11:40Z

This is ready for a review now. There are lots of tradeoffs with how accurate we want benchmarks to be vs their overall time, but I tried to find a good balance for now.

Currently, the benchmarks CI task takes ~ 9 minutes to run. Why does it take so long? Codspeed instruments the code to measure CPU cycles to get more reliable benchmarks on flaky hardware like that in CI:

CodSpeed instruments your benchmarks to measure the performance of your code. A benchmark will be run only once and the CPU behavior will be simulated. This ensures that the measurement is as accurate as possible, taking into account not only the instructions executed but also the cache and memory access patterns. The simulation gives us an equivalent of the CPU cycles that includes cache and memory access.

In order to keep the timings reasonable, I reduce the number of times the eggcc-extraction schedules run from 5 to 2. Currently, on 2 it takes ~4 minutes for that benchmark alone to run. If I switch it even to 3 it takes 46 minutes, which seems too long. One question might be if the eggcc-extraction is doing similar enough work at say 2 runs opposed to 3 runs. Here is a screenshot of different profiles, the 3 on the left (taking around 10x as long) and the 2 on the right:

They look similar enough to probably have reasonable differences in performances based on the shorter one.

Here is a rough outline of where time is spent in the benchmarks currently:

I hope that even if this isn't a perfect representation of production tasks, it can still be helpful as we iterate over PRs. If your PR is minor, you can always merge it in without waiting for the benchmarks to finish if you don't want to wait.

tests/eggcc-extraction.egg

…ction

benches/examples.rs

saulshanabrook · 2024-10-16T16:09:45Z

I had increased the "threshold" for codspeed to say whether the performance change was meaningful to 10%. I brought it back down to 5% so it can alert us to more changes:

I think the main thing it affects is the comments on PRs. If you click on the report, you can see a granular breakdown of all changes.

Alex-Fischman · 2024-10-21T02:55:00Z

@saulshanabrook Are you sure about this last change? The website says that the default is 10%, and 10% seems more in line with the variance that we're seeing.

saulshanabrook · 2024-10-22T01:08:18Z

Yeah I'm open to that! It just seems like a 5% speedup on the long running. Benchmarks is actually rather significant... And if we are trying to improve them then we might need many smaller performance improvements to get there. I don't see much variability at all on the larger benchmarks.

Benchmark all examples with codspeed

ba5df7c

saulshanabrook requested a review from a team as a code owner October 13, 2024 01:07

saulshanabrook requested review from ajpal and removed request for a team October 13, 2024 01:07

saulshanabrook added 3 commits October 12, 2024 21:10

fix nits

7cc546b

final nit

9e1ec54

only benchmark non failing programs

fd2682e

saulshanabrook added 10 commits October 15, 2024 13:42

Fix benchmarking

a3218b5

Reduce time of eggcc-extraction benchmark

0905b72

Further decrease egg-extraction time

b04028a

Add codspeed badge

be632e7

Keep file loading out of benchmark

5fb294f

skip python array optimize to save time

cb25ca8

Try running eggcc-extraction twice to get more accurate data

bf4bd33

Run math microbenchmark a few less times to reduce time

e5437ac

Revert "Run math microbenchmark a few less times to reduce time"

7bb841c

e5437ac

Try running eggcc three times

91d8142

saulshanabrook commented Oct 15, 2024

View reviewed changes

src/actions.rs

stack: &mut Vec<Value>,

stack: &mut [Value],

Copy link

Member Author

saulshanabrook Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit update from rust upgrade

saulshanabrook commented Oct 15, 2024

View reviewed changes

rust-toolchain.toml

channel = "1.74.0"

channel = "1.79.0"

Copy link

Member Author

saulshanabrook Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

had to update toolchain codspeed dep

saulshanabrook commented Oct 15, 2024

View reviewed changes

src/sort/fn.rs

.map(|s| s.clone())

.cloned()

Copy link

Member Author

saulshanabrook Oct 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit update from rust upgrade

Revert "Try running eggcc three times"

57ea7a0

91d8142

saulshanabrook changed the title ~~Benchmark all examples with codspeed~~ Benchmark examples with codspeed Oct 15, 2024

saulshanabrook requested review from yihozhang and removed request for ajpal October 15, 2024 21:30

Alex-Fischman reviewed Oct 15, 2024

View reviewed changes

tests/eggcc-extraction.egg Show resolved Hide resolved

Add comment to explain why we are only repeating twice in eggcc-extra…

1af8ac0

…ction

saulshanabrook requested a review from Alex-Fischman October 15, 2024 22:24

Alex-Fischman approved these changes Oct 16, 2024

View reviewed changes

benches/examples.rs Outdated Show resolved Hide resolved

benches/examples.rs Outdated Show resolved Hide resolved

benches/examples.rs Outdated Show resolved Hide resolved

saulshanabrook added 3 commits October 15, 2024 20:49

remove custom reserved symbol for bench

a19650a

fix typo

2fa06d9

Rename examples.rs bench file

289e457

saulshanabrook force-pushed the codspeed branch from 09f9730 to 289e457 Compare October 16, 2024 00:52

saulshanabrook merged commit 34b5d7b into egraphs-good:main Oct 16, 2024
4 checks passed

saulshanabrook deleted the codspeed branch October 16, 2024 00:57

Alex-Fischman mentioned this pull request Oct 16, 2024

Sort declaration cleanup #442

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark examples with codspeed #443

Benchmark examples with codspeed #443

saulshanabrook commented Oct 13, 2024 •

edited

Loading

saulshanabrook commented Oct 13, 2024

saulshanabrook commented Oct 14, 2024 •

edited

Loading

codspeed-hq bot commented Oct 15, 2024 •

edited

Loading

Detected benchmarks

saulshanabrook Oct 15, 2024

saulshanabrook Oct 15, 2024

saulshanabrook Oct 15, 2024

saulshanabrook Oct 15, 2024

saulshanabrook Oct 15, 2024

saulshanabrook commented Oct 15, 2024

saulshanabrook commented Oct 16, 2024

Alex-Fischman commented Oct 21, 2024

saulshanabrook commented Oct 22, 2024

		egraph.fact_directory = args.fact_directory.clone();
		egraph.fact_directory.clone_from(&args.fact_directory);

Benchmark examples with codspeed #443

Benchmark examples with codspeed #443

Conversation

saulshanabrook commented Oct 13, 2024 • edited Loading

saulshanabrook commented Oct 13, 2024

saulshanabrook commented Oct 14, 2024 • edited Loading

codspeed-hq bot commented Oct 15, 2024 • edited Loading

CodSpeed Performance Report

Congrats! CodSpeed is installed 🎉

Detected benchmarks

saulshanabrook Oct 15, 2024

Choose a reason for hiding this comment

saulshanabrook Oct 15, 2024

Choose a reason for hiding this comment

saulshanabrook Oct 15, 2024

Choose a reason for hiding this comment

saulshanabrook Oct 15, 2024

Choose a reason for hiding this comment

saulshanabrook Oct 15, 2024

Choose a reason for hiding this comment

saulshanabrook commented Oct 15, 2024

saulshanabrook commented Oct 16, 2024

Alex-Fischman commented Oct 21, 2024

saulshanabrook commented Oct 22, 2024

saulshanabrook commented Oct 13, 2024 •

edited

Loading

saulshanabrook commented Oct 14, 2024 •

edited

Loading

codspeed-hq bot commented Oct 15, 2024 •

edited

Loading