How to Track Benchmarks in CI with Bencher
Most benchmark results are ephemeral. They disappear as soon as your terminal reaches its scrollback limit. Some benchmark harnesses let you cache results, but most only do so locally. Bencher allows you to track your benchmarks from both local and CI runs and compare the results, while still using your favorite benchmark harness.
There are three popular ways to compare benchmark results when Continuous Benchmarking, that is benchmarking in CI:
- Statistical Continuous Benchmarking
- Track benchmark results over time to create a baseline
- Use this baseline along with a Statistical Threshold to create a statistical boundary
- Compare the new results against this statistical boundary to detect performance regressions
- Relative Continuous Benchmarking
- Run the benchmarks for the current baseline code
- Switch over to the new version of the code
- Use a Percentage Threshold to create a boundary for the baseline code
- Run the benchmarks for the new version of the code
- Compare the new version of the code results against the baseline code results to detect performance regressions
- Change Point Detection
- Occasionally run the benchmarks for new versions of the code
- Use a change point detection algorithm to detect performance regressions
- Bisect to find the commit that introduced the performance regression
Statistical Continuous Benchmarking
Picking up where we left off in the
Quick Start and Docker Self-Hosted tutorials,
let’s add Statistical Continuous Benchmarking to our claimed project.
🐰 Make sure you have created an API token and set it as the
BENCHER_API_TOKENenvironment variable before continuing on!
Now we are ready to run our benchmarks in CI. Because every CI environment is a little bit different, the following example is meant to be more illustrative than practical. For more specific examples, see Continuous Benchmarking in GitHub Actions and Continuous Benchmarking in GitLab CI/CD.
First, we need to create and maintain a historical baseline for our main branch by benchmarking every change in CI:
bencher run \--project project-abc4567-wxyz123456789 \--branch main \--testbed ci-runner \--threshold-measure latency \--threshold-test t_test \--threshold-max-sample-size 64 \--threshold-upper-boundary 0.99 \--thresholds-reset \--err \--adapter json \bencher mock- Use the
bencher runCLI subcommand to run yourmainbranch benchmarks. See thebencher runCLI subcommand for a full overview. (ex:bencher run) - Set the
--projectoption to the Project slug. See the--projectdocs for more details. (ex:--project project-abc4567-wxyz123456789) - Set the
--branchoption to the base Branch name. See the--branchdocs for a full overview. (ex:--branch main) - Set the
--testbedoption to the CI runner Testbed name. See the--testbeddocs for more details. (ex:--testbed ci-runner) - Set the Threshold for the
mainBranch,ci-runnerTestbed, andlatencyMeasure:- Set the
--threshold-measureoption to the built-inlatencyMeasure that is generated bybencher mock. See the--threshold-measuredocs for more details. (ex:--threshold-measure latency) - Set the
--threshold-testoption to a Student’s t-test (t_test). See the--threshold-testdocs for a full overview. (ex:--threshold-test t_test) - Set the
--threshold-max-sample-sizeoption to the maximum sample size of64. See the--threshold-max-sample-sizedocs for more details. (ex:--threshold-max-sample-size 64) - Set the
--threshold-upper-boundaryoption to the Upper Boundary of0.99. See the--threshold-upper-boundarydocs for more details. (ex:--threshold-upper-boundary 0.99) - Set the
--thresholds-resetflag so that only the specified Threshold is active. See the--thresholds-resetdocs for a full overview. (ex:--thresholds-reset)
- Set the
- Set the
--errflag to fail the command if an Alert is generated. See the--errdocs for a full overview. (ex:--err) - Set the
--adapteroption to Bencher Metric Format JSON (json) that is generated bybencher mock. See benchmark harness adapters for a full overview. (ex:--adapter json) - Specify the benchmark command arguments.
See benchmark command for a full overview.
(ex:
bencher mock)
The first time this is command is run in CI,
it will create the main Branch if it does not exist yet.
The new main will not have a start point or existing data.
A Threshold will be created for the main Branch, ci-runner Testbed, and latency Measure.
On subsequent runs, new data will be added to the main Branch.
The specified Threshold will then be used to detect performance regressions.
Now, we are ready to catch performance regressions in CI.
This is how we would track the performance of a new feature branch in CI, aptly named feature-branch:
bencher run \--project project-abc4567-wxyz123456789 \--branch feature-branch \--start-point main \--start-point-hash 32aea434d751648726097ed3ac760b57107edd8b \--start-point-clone-thresholds \--start-point-reset \--testbed ci-runner \--err \--adapter json \bencher mock- Use the
bencher runCLI subcommand to run yourfeature-branchbranch benchmarks. See thebencher runCLI subcommand for a full overview. (ex:bencher run) - Set the
--projectoption to the Project slug. See the--projectdocs for more details. (ex:--project project-abc4567-wxyz123456789) - Set the
--branchoption to the feature Branch name. See the--branchdocs for a full overview. (ex:--branch feature-branch) - Set the Start Point for the
feature-branchBranch:- Set the
--start-pointoption to the feature Branch start point. See the--start-pointdocs for a full overview. (ex:--start-point main) - Set the
--start-point-hashoption to the feature Branch start pointgithash. See the--start-point-hashdocs for a full overview. (ex:--start-point-hash 32ae...dd8b) - Set the
--start-point-clone-thresholdsflag to clone the Thresholds from the start point. See the--start-point-clone-thresholdsdocs for a full overview. (ex:--start-point-clone-thresholds) - Set the
--start-point-resetflag to always reset the Branch to the start point. This will prevent benchmark data drift. See the--start-point-resetdocs for a full overview. (ex:--start-point-reset)
- Set the
- Set the
--testbedoption to the Testbed name. See the--testeddocs for more details. (ex:--testbed ci-runner) - Set the
--errflag to fail the command if an Alert is generated. See the--errdocs for a full overview. (ex:--err) - Set the
--adapteroption to Bencher Metric Format JSON (json) that is generated bybencher mock. See benchmark harness adapters for a full overview. (ex:--adapter json) - Specify the benchmark command arguments.
See benchmark command for a full overview.
(ex:
bencher mock)
The first time this is command is run in CI,
Bencher will create the feature-branch Branch since it does not exist yet.
The new feature-branch will use the main Branch
at hash 32aea434d751648726097ed3ac760b57107edd8b as its start point.
This means that feature-branch will have a copy of all the data and Thresholds
from the main Branch to compare the results of bencher mock against.
On all subsequent runs, Bencher will reset the feature-branch Branch to the start point,
and use the main Branch data and Thresholds to detect performance regressions.
Relative Continuous Benchmarking
Picking up where we left off in the
Quick Start and Docker Self-Hosted tutorials,
let’s add Relative Continuous Benchmarking to our claimed project.
🐰 Make sure you have created an API token and set it as the
BENCHER_API_TOKENenvironment variable before continuing on!
Relative Continuous Benchmarking runs a side-by-side comparison of two versions of your code.
This can be useful when dealing with noisy CI/CD environments,
where the resources available can be highly variable between runs.
In this example we will be comparing the results from running on the main branch
to results from running on a feature branch, aptly named feature-branch.
Because every CI environment is a little bit different,
the following example is meant to be more illustrative than practical.
For more specific examples, see Continuous Benchmarking in GitHub Actions
and Continuous Benchmarking in GitLab CI/CD.
First, we need to checkout the main branch with git in CI:
git checkout mainThen we need to run our benchmarks on the main branch in CI:
bencher run \--project project-abc4567-wxyz123456789 \--branch main \--start-point-reset \--testbed ci-runner \--adapter json \bencher mock- Use the
bencher runCLI subcommand to run yourmainbranch benchmarks. See thebencher runCLI subcommand for a full overview. (ex:bencher run) - Set the
--projectoption to the Project slug. See the--projectdocs for more details. (ex:--project project-abc4567-wxyz123456789) - Set the
--branchoption to the base Branch name. See the--branchdocs for a full overview. (ex:--branch main) - Set the
--start-point-resetflag to always reset the base Branch. This will make sure that all of the benchmark data is from the current CI runner. See the--start-point-resetdocs for a full overview. (ex:--start-point-reset) - Set the
--testbedoption to the CI runner Testbed name. See the--testbeddocs for more details. (ex:--testbed ci-runner) - Set the
--adapteroption to Bencher Metric Format JSON (json) that is generated bybencher mock. See benchmark harness adapters for a full overview. (ex:--adapter json) - Specify the benchmark command arguments.
See benchmark command for a full overview.
(ex:
bencher mock)
The first time this is command is run in CI,
it will create the main Branch since it does not exist yet.
The new main will not have a start point, existing data, or Thresholds.
On subsequent runs, the old main Head will be replaced
and a new main Head will be created without a start point, existing data, or Thresholds.
Next, we need to checkout the feature-branch branch with git in CI:
git checkout feature-branchFinally, we are ready to run our feature-branch benchmarks in CI:
bencher run \--project project-abc4567-wxyz123456789 \--branch feature-branch \--start-point main \--start-point-reset \--testbed ci-runner \--threshold-measure latency \--threshold-test percentage \--threshold-upper-boundary 0.25 \--thresholds-reset \--err \--adapter json \bencher mock- Use the
bencher runCLI subcommand to run yourfeature-branchbenchmarks. See thebencher runCLI subcommand for a full overview. (ex:bencher run) - Set the
--projectoption to the Project slug. See the--projectdocs for more details. (ex:--project project-abc4567-wxyz123456789) - Set the
--branchoption to the feature branch Branch name. See the--branchdocs for a full overview. (ex:--branch feature-branch) - Set the Start Point for the
feature-branchBranch:- Set the
--start-pointoption to the feature Branch start point. See the--start-pointdocs for a full overview. (ex:--start-point main) - Set the
--start-point-resetflag to always reset the Branch to the start point. This will use only the latest relative benchmark results. See the--start-point-resetdocs for a full overview. (ex:--start-point-reset)
- Set the
- Set the
--testbedoption to the CI runner Testbed name. See the--testbeddocs for more details. (ex:--testbed ci-runner) - Set the Threshold for the
feature-branchBranch,ci-runnerTestbed, andlatencyMeasure:- Set the
--threshold-measureoption to the built-inlatencyMeasure that is generated bybencher mock. See the--threshold-measuredocs for more details. (ex:--threshold-measure latency) - Set the
--threshold-testoption to a basic percentage (percentage). See the--threshold-testdocs for a full overview. (ex:--threshold-test percentage) - Set the
--threshold-upper-boundaryoption to the Upper Boundary of0.25. See the--threshold-upper-boundarydocs for more details. (ex:--threshold-upper-boundary 0.25) - Set the
--thresholds-resetflag so that only the specified Threshold is active. See the--thresholds-resetdocs for a full overview. (ex:--thresholds-reset)
- Set the
- Set the
--errflag to fail the command if an Alert is generated. See the--errdocs for a full overview. (ex:--err) - Set the
--adapteroption to Bencher Metric Format JSON (json) that is generated bybencher mock. See benchmark harness adapters for a full overview. (ex:--adapter json) - Specify the benchmark command arguments.
See benchmark command for a full overview.
(ex:
bencher mock)
Every time this command is run in CI,
it is comparing the results from feature-branch against only the most recent results from main.
The specified Threshold is then used to detect performance regressions.
Change Point Detection
Change Point Detection uses a change point algorithm to evaluate a large window of recent results. This allows the algorithm to ignore outliers as noise and produce fewer false positives. Even though Change Point Detection is considered continuous benchmarking, it does not allow you to detect performance regression in CI. That is, you cannot detect a performance regression before a feature branch merges. This is sometimes referred to as “out-of-band” detection.
For example, if you have a benchmark bench_my_critical_path,
and it had the following historical latencies: 5 ms, 6 ms, 5 ms, 5ms, 7ms.
If the next benchmark result was 11 ms then a Statistical Continuous Benchmarking threshold
and Change Point Detection algorithm would interpret things very differently.
The threshold would likely be exceeded and an alert would be generated.
If this benchmark run was tied to a pull request,
the build would likely be set to fail due to this alert.
However, the change point algorithm wouldn’t do anything… yet.
If the next run things dropped back down to 5 ms then it would probably not generate an alert.
Conversely, if the next run or two resulted in 10 ms and 12 ms,
only then would the change point algorithm trigger an alert.
Are you interested in using Change Point Detection with Bencher? If so, please leave a comment on the tracking issue or reach out to us directly.
🐰 Congrats! You have learned how to track benchmarks in CI with Bencher! 🎉