Benchmarking Overview


Most benchmark results are ephemeral. They disappear as soon as your terminal reaches its scrollback limit. Some benchmark harnesses let you cache results, but most only do so locally. Bencher allows you to track your benchmarks from both local and CI runs and compare against historical results.

The easiest way to track your benchmarks is the bencher run CLI subcommand. It wraps your existing benchmark harness output and generates a Report. This Report is then sent to the Bencher API server, where the benchmark harness output is parsed using a benchmark harness adapter. The benchmark harness adapter detects all of the Benchmarks that are present and their corresponding Metrics. These Benchmarks and Metrics are then saved along with the Report. If there is a Threshold set, then the new Metrics are compared against the historical Metrics for each Benchmark present in the Report. If a regression is detected, then an Alert will be generated.

From here on out we will refer to your “benchmarks” as “performance regression tests” to avoid any confusion.

Benchmarks

A Benchmark is a named performance regression test. If the performance regression test is new to Bencher, then a Benchmark is automatically created. Otherwise, the name of the performance regression test is used as the unique identifier for the Benchmark.

Be careful when changing the name of your performance regression tests. You will need to manually rename the Benchmark in Bencher to match this new name. Otherwise, the renamed performance regression test will be considered a new Benchmark. This same word of caution also applies to moving some performance regression tests. Depending on the benchmark harness, the path to the performance regression test may be a part of its name.

The only exception to the above caveat is ignoring a Benchmark. See suppressing alerts for a full overview.

Metrics

A Metric is a single, point-in-time performance regression test result. Up to three Values may be collected for a single Metric: value, lower_value, and upper_value. The value is required for all Metrics while the lower_value and upper_value are independently optional. Which Values are collected is determined by the benchmark harness adapter.

Measures

A Measure is the unit of measurement for a Metric. By default all Projects start with a Latency and Throughput Measure with units of nanoseconds (ns) and operations / second (ops/s) respectively. The Measure is determined by the benchmark harness adapter.


Report

A Report is a collection Benchmarks and their Metrics for a particular Branch and Testbed. Reports are most often generated using the bencher run CLI subcommand. See how to track performance regression tests for a full overview.

Branch

A Branch is the git ref used when running a Report (ie branch name or tag). By default all Projects start with a main Branch. When using the bencher run CLI subcommand, main is the default Branch if one is not provided. See branch selection for a full overview.

Start Point

A Branch can have a Start Point. A Start Point is another Branch at a specific version (and git hash, if available). All the Metrics and optionally Thresholds are copied over from the Start Point. See branch selection for a full overview.

Testbed

A Testbed is the name of the testing environment used when running a Report. By default all Projects start with a localhost Testbed. When using the bencher run CLI subcommand, localhost is the default Testbed if one is not provided.



🐰 Congrats! You have learned all about tracking benchmarks performance regression tests! 🎉


Keep Going: bencher run CLI Subcommand ➡



Published: Sat, August 12, 2023 at 4:07:00 PM UTC | Last Updated: Wed, March 27, 2024 at 7:50:00 AM UTC