Prompt Benchmarks

Docs · Website · Twitter · discord · Quickstart · Online Playground

This repo contains benchmarks for tscircuit system prompts used for automatically generating tscircuit code.

Running Benchmarks

You can use bun run benchmark to select and run a benchmark. A single prompt takes about 10s-15s to run when run with sonnet. We have a set of samples (see the tests/samples directory) that the benchmarks run against. When you change a prompt, you must run the benchmark for that prompt to update the benchmark snapshot. This is how we record degradation or improvement in the response quality. Each sample is run 5 times and two tests are run:

Does the output from the prompt compile?
Does the output produce the expected circuit?

The benchmark shows the percentage of samples that pass (1) and (2)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Prompt Benchmarks

Running Benchmarks

Files

README.md

Latest commit

History

README.md

File metadata and controls

Prompt Benchmarks

Running Benchmarks