How to Track Benchmarks in CI with Bencher


Most benchmark results are ephemeral. They disappear as soon as your terminal reaches its scrollback limit. Some benchmark harnesses let you cache results, but most only do so locally. Bencher allows you to track your benchmarks from both local and CI runs and compare the results, while still using your favorite benchmark harness.

There are two popular ways to compare benchmark results when Continuous Benchmarking, that is benchmarking in CI:

  • Statistical Continuous Benchmarking
    1. Track benchmark results over time to create a baseline
    2. Use this baseline along with Statistical Thresholds to create a statistical boundary
    3. Compare the new results against this statistical boundary to detect performance regressions
  • Relative Continuous Benchmarking
    1. Run the benchmarks for the current baseline code
    2. Use Percentage Thresholds to create a boundary for the baseline code
    3. Switch over to the new version of the code
    4. Run the benchmarks for the new version of the code
    5. Compare the new version of the code results against the baseline code results to detect performance regressions

Statistical Continuous Benchmarking

Picking up where we left off in the Quick Start and Docker Self-Hosted tutorials, let’s add Statistical Continuous Benchmarking to our Save Walter White project.

🐰 Make sure you have created an API token and set it as the BENCHER_API_TOKEN environment variable before continuing on!

First, we need to create a new Testbed to represent our CI runners, aptly named ci-runner.

bencher testbed create \
--name ci-runner \
save-walter-white-1234abcd
  1. Use the bencher testbed create CLI subcommand. See the testbed create docs for more details. (ex: bencher testbed create)
  2. Set the --name option to the desired Testbed name. (ex: --name ci-runner)
  3. Specify the project argument as the Save Walter White project slug. (ex: save-walter-white-1234abcd)

Next, we need to create a new Threshold for our ci-runner Testbed:

bencher threshold create \
--branch main \
--testbed ci-runner \
--measure Latency \
--test t-test \
--upper-boundary 0.95 \
save-walter-white-1234abcd
  1. Use the bencher threshold create CLI subcommand. See the threshold create docs for more details. (ex: bencher threshold create)
  2. Set the --branch option to the default main Branch. (ex: --branch main)
  3. Set the --branch option to the new ci-runner Testbed. (ex: --testbed ci-runner)
  4. Set the --measure option to the built-in Latency Measure that is generated by bencher mock. See the definition of Measure for details. (ex: --measure Latency)
  5. Set the --test option to a t-test Threshold. See Thresholds & Alerts for a full overview. (ex: --test t-test)
  6. Set the --upper-boundary option to an Upper Boundary of 0.95. See Thresholds & Alerts for a full overview. (ex: --upper-boundary 0.95)
  7. Specify the project argument as the Save Walter White project slug. (ex: save-walter-white-1234abcd)

Now we are ready to run our benchmarks in CI. Because every CI environment is a little bit different, the following example is meant to be more illustrative than practical. For more specific examples, see Continuous Benchmarking in GitHub Actions and Continuous Benchmarking in GitLab CI/CD.

We need to create and maintain a historical baseline for our main branch by benchmarking every change in CI:

bencher run \
--project save-walter-white-1234abcd \
--branch main \
--testbed ci-runner \
--adapter json \
--err \
bencher mock
  1. Use the bencher run CLI subcommand to run your feature-branch branch benchmarks. See the bencher run CLI subcommand for a full overview. (ex: bencher run)
  2. Set the --project option to the Project slug. See the --project docs for more details. (ex: --project save-walter-white-1234abcd)
  3. Set the --branch option to the default Branch name. See branch selection for a full overview. (ex: --branch main)
  4. Set the --testbed option to the Testbed name. See the --tested docs for more details. (ex: --testbed ci-runner)
  5. Set the --adapter option to the desired benchmark harness adapter. See benchmark harness adapters for a full overview. (ex: --adapter json)
  6. Set the --err flag to fail the command if an Alert is generated. See Threshold & Alerts for a full overview. (ex: --err)
  7. Specify the benchmark command arguments. See benchmark command for a full overview. (ex: bencher mock)

Finally, we are ready to catch performance regressions in CI. This is how we would track the performance of a new feature branch, named feature-branch, in CI:

bencher run \
--project save-walter-white-1234abcd \
--branch feature-branch \
--branch-start-point main \
--branch-start-point-hash 32aea434d751648726097ed3ac760b57107edd8b \
--testbed ci-runner \
--adapter json \
--err \
bencher mock
  1. Use the bencher run CLI subcommand to run your feature-branch branch benchmarks. See the bencher run CLI subcommand for a full overview. (ex: bencher run)
  2. Set the --project option to the Project slug. See the --project docs for more details. (ex: --project save-walter-white-1234abcd)
  3. Set the --branch option to the feature Branch name. See branch selection for a full overview. (ex: --branch feature-branch)
  4. Set the --branch-start-point option to the feature Branch start point. See branch selection for a full overview. (ex: --branch-start-point main)
  5. Set the --branch-start-point-hash option to the feature Branch start point git hash. See branch selection for a full overview. (ex: --branch-start-point-hash 32ae...dd8b)
  6. Set the --testbed option to the Testbed name. See the --tested docs for more details. (ex: --testbed ci-runner)
  7. Set the --adapter option to the desired benchmark harness adapter. See benchmark harness adapters for a full overview. (ex: --adapter json)
  8. Set the --err flag to fail the command if an Alert is generated. See Threshold & Alerts for a full overview. (ex: --err)
  9. Specify the benchmark command arguments. See benchmark command for a full overview. (ex: bencher mock)

The first time this is command is run in CI, it will create the feature-branch Branch since it does not exist yet. The new feature-branch will use the main Branch at hash 32aea434d751648726097ed3ac760b57107edd8b as its start point. This means that feature-branch will have a copy of all the data and Thresholds from the main Branch to compare the results of bencher mock against, for the first and all subsequent runs.

Relative Continuous Benchmarking

Picking up where we left off in the Quick Start and Docker Self-Hosted tutorials, let’s add Relative Continuous Benchmarking to our Save Walter White project.

🐰 Make sure you have created an API token and set it as the BENCHER_API_TOKEN environment variable before continuing on!

First, we need to create a new Testbed to represent our CI runners, aptly named ci-runner.

bencher testbed create \
--name ci-runner \
save-walter-white-1234abcd
  1. Use the bencher testbed create CLI subcommand. See the testbed create docs for more details. (ex: bencher testbed create)
  2. Set the --name option to the desired Testbed name. (ex: --name ci-runner)
  3. Specify the project argument as the Save Walter White project slug. (ex: save-walter-white-1234abcd)

Relative Continuous Benchmarking runs a side-by-side comparison of two versions of your code. This can be useful when dealing with noisy CI/CD environments, where the resources available can be highly variable between runs. In this example we will be comparing the results from running on the main branch to results from running on a feature branch named feature-branch. Because every CI environment is a little bit different, the following example is meant to be more illustrative than practical. For more specific examples, see Continuous Benchmarking in GitHub Actions and Continuous Benchmarking in GitLab CI/CD.

First, we need to checkout the main branch with git in CI:

git checkout main

Then we need to run our benchmarks on the main branch in CI:

bencher run \
--project save-walter-white-1234abcd \
--branch feature-branch \
--branch-reset \
--testbed ci-runner \
--adapter json \
bencher mock
  1. Use the bencher run CLI subcommand to run your main branch benchmarks. See the bencher run CLI subcommand for a full overview. (ex: bencher run)
  2. Set the --project option to the Project slug. See the --project docs for more details. (ex: --project save-walter-white-1234abcd)
  3. Set the --branch option to the feature Branch name. See branch selection for a full overview. (ex: --branch feature-branch)
  4. Set the --branch-reset flag. See branch selection for a full overview. (ex: --branch-reset)
  5. Set the --testbed option to the Testbed name. See the --tested docs for more details. (ex: --testbed ci-runner)
  6. Set the --adapter option to the desired benchmark harness adapter. See benchmark harness adapters for a full overview. (ex: --adapter json)
  7. Specify the benchmark command arguments. See benchmark command for a full overview. (ex: bencher mock)

The first time this is command is run in CI, it will create the feature-branch Branch since it does not exist yet. The new feature-branch will not have a start point, existing data, or Thresholds. On subsequent runs, the old version of feature-branch will be renamed and a new feature-branch will be created without a start point, existing data, or Thresholds.

Next, we need to create a new Threshold in CI for our new feature-branch Branch:

bencher threshold create \
--branch feature-branch \
--testbed ci-runner \
--measure Latency \
--test percentage \
--upper-boundary 0.25 \
save-walter-white-1234abcd
  1. Use the bencher threshold create CLI subcommand. See the threshold create docs for more details. (ex: bencher threshold create)
  2. Set the --branch option to the new feature-branch Branch. (ex: --branch feature-branch)
  3. Set the --branch option to the ci-runner Testbed. (ex: --testbed ci-runner)
  4. Set the --measure option to the built-in Latency Measure that is generated by bencher mock. See the definition of Measure for details. (ex: --measure Latency)
  5. Set the --test option to a percentage Threshold. See Thresholds & Alerts for a full overview. (ex: --test t-test)
  6. Set the --upper-boundary option to an Upper Boundary of 0.25 (ie 25%). See Thresholds & Alerts for a full overview. (ex: --upper-boundary 0.25)
  7. Specify the project argument as the Save Walter White project slug. (ex: save-walter-white-1234abcd)

Then, we need to checkout the feature-branch branch with git in CI:

git checkout feature-branch

Finally, we are ready to run our feature-branch benchmarks in CI:

bencher run \
--project save-walter-white-1234abcd \
--branch feature-branch \
--testbed ci-runner \
--adapter json \
--err \
bencher mock
  1. Use the bencher run CLI subcommand to run your feature-branch benchmarks. See the bencher run CLI subcommand for a full overview. (ex: bencher run)
  2. Set the --project option to the Project slug. See the --project docs for more details. (ex: --project save-walter-white-1234abcd)
  3. Set the --branch option to the feature Branch name. See branch selection for a full overview. (ex: --branch feature-branch)
  4. Set the --testbed option to the Testbed name. See the --tested docs for more details. (ex: --testbed ci-runner)
  5. Set the --adapter option to the desired benchmark harness adapter. See benchmark harness adapters for a full overview. (ex: --adapter json)
  6. Set the --err flag to fail the command if an Alert is generated. See Threshold & Alerts for a full overview. (ex: --err)
  7. Specify the benchmark command arguments. See benchmark command for a full overview. (ex: bencher mock)

Every time this command is run in CI, it is comparing the results from feature-branch against only the most recent results from main.



🐰 Congrats! You have learned how to track benchmarks in CI with Bencher! 🎉


Add Bencher to GitHub Actions ➡

Add Bencher to GitLab CI/CD ➡



Published: Sat, August 12, 2023 at 4:07:00 PM UTC | Last Updated: Mon, April 1, 2024 at 7:00:00 AM UTC