Benchmarking Overview

Most benchmark results are ephemeral. They disappear as soon as your terminal reaches its scrollback limit. Some benchmark harnesses let you cache results, but most only do so locally. Bencher allows you to track your benchmarks from both local and CI runs and compare against historical results.

The easiest way to track your benchmarks is the bencher run CLI subcommand. It wraps your existing benchmark harness output and generates a Report. This Report is then sent to the Bencher API server, where the benchmark harness output is parsed using a benchmark harness adapter. The benchmark harness adapter detects all of the Benchmarks that are present and their corresponding Metrics. These Benchmarks and Metrics are then saved along with the Report. If there is a Threshold set, then the new Metrics are compared against the historical Metrics for each Benchmark present in the Report. If a regression is detected, then an Alert will be generated.

From here on out we will refer to your “benchmarks” as “performance regression tests” to avoid any confusion.

Benchmark

A Benchmark is a named performance regression test. If the performance regression test is new to Bencher, then a Benchmark is automatically created. Otherwise, the name of the performance regression test is used as the unique identifier for the Benchmark.

Be careful when changing the name of your performance regression tests. You will need to manually rename the Benchmark in Bencher to match this new name. Otherwise, the renamed performance regression test will be considered a new Benchmark. This same word of caution also applies to moving some performance regression tests. Depending on the benchmark harness, the path to the performance regression test may be a part of its name.

The only exception to the above caveat is ignoring a Benchmark. See suppressing alerts for a full overview.

Metric

A Metric is a single, point-in-time performance regression test result. Up to three Values may be collected for a single Metric: value, lower_value, and upper_value. The value is required for all Metrics while the lower_value and upper_value are independently optional. Which Values are collected is determined by the benchmark harness adapter.

Measure

A Measure is the unit of measurement for a Metric. By default all Projects start with a Latency and Throughput Measure with units of nanoseconds (ns) and operations / second (ops/s) respectively. The Measure is determined by the benchmark harness adapter.

Report

A Report is a collection Benchmarks and their Metrics for a particular Branch and Testbed. Reports are most often generated using the bencher run CLI subcommand. See how to track performance regression tests for a full overview.

Branch

A Branch is the git ref used when running a Report (ie branch name or tag). When using the bencher run CLI subcommand, if a Branch is not specified then the current git branch name is used if it is available. Otherwise, main is used as the default Branch. See branch selection for a full overview.

Head

The Head of a Branch is the most recent instance of the Branch. It references the most recent Start Point, if there is one. Whenever a Branch gets a new Start Point, it gets a new Head. See branch selection for a full overview.

Start Point

A Branch can have a Start Point. A Start Point is another Branch at a specific version (and git hash, if available). Historical Metrics and optionally Thresholds are copied over from the Start Point. See branch selection for a full overview.

Testbed

A Testbed is the name of the testing environment used when running a Report. When using the bencher run CLI subcommand, Linux, macOS, or Windows is used as the default Testbed based on the host operating system. If the bencher CLI has been compiled for a different operating system, then localhost is used.

Threshold

A Threshold is used to catch performance regressions. A Threshold is assigned to a unique combination of: Branch, Testbed, and Measure. See thresholds for a full overview.

Test

A Test is used by a Threshold to detect performance regressions. The combination of a Test and its parameters is called a Model. See thresholds for a full overview.

Model

A Model is the combination of a Test and its parameters for a Threshold. A Model must have a Lower Boundary, Upper Boundary, or both.

Lower Boundary
- A Lower Boundary is used when a smaller value would indicate a performance regression, such as with the Throughput Measure.
Upper Boundary
- An Upper Boundary is used when a larger value would indicate a performance regression, such as with the Latency Measure.

Each Boundary is used to calculate a Boundary Limit. Then every new Metric is checked against each Boundary Limit. An Alert is generated when a new Metric is below a Lower Boundary Limit or above an Upper Boundary Limit. See thresholds for a full overview.

Boundary Limit

A Boundary Limit is the value calculated from a Lower Boundary or Upper Boundary. It is used to compare against a new Metric. An Alert is generated when a new Metric is below a Lower Boundary Limit or above an Upper Boundary Limit. See thresholds for a full overview.

Alert

An Alert is generated when a new Metric fails a Test by being below a Lower Boundary Limit or above an Upper Boundary Limit. See thresholds for a full overview.

🐰 Congrats! You have learned all about tracking ~~benchmarks~~ performance regression tests! 🎉

Keep Going: `bencher run` CLI Subcommand ➡

Published: Sat, August 12, 2023 at 4:07:00 PM UTC | Last Updated: Thu, October 10, 2024 at 7:50:00 AM UTC