How to build a Custom Benchmarking Harness in Rust
Everett Pompeii
What is Benchmarking?
Benchmarking is the practice of testing the performance of your code to see how fast (latency) or how much (throughput) work it can do. This often overlooked step in software development is crucial for creating and maintaining fast and performant code. Benchmarking provides the necessary metrics for developers to understand how well their code performs under various workloads and conditions. For the same reasons that you write unit and integration tests to prevent feature regressions, you should write benchmarks to prevent performance regressions. Performance bugs are bugs!
Benchmarking in Rust
The three popular options for benchmarking in Rust are: libtest bench, Criterion, and Iai.
libtest is Rust’s built-in unit testing and benchmarking framework.
Though part of the Rust standard library, libtest bench is still considered unstable,
so it is only available on nightly
compiler releases.
To work on the stable Rust compiler,
a separate benchmarking harness
needs to be used.
Neither is being actively developed, though.
The most popular benchmarking harness within the Rust ecosystem is Criterion.
It works on both stable and nightly
Rust compiler releases,
and it has become the de facto standard within the Rust community.
Criterion is also much more feature-rich compared to libtest bench.
An experimental alternative to Criterion is Iai, from the same creator as Criterion. However, it uses instruction counts instead of wall clock time: CPU instructions, L1 accesses, L2 access and RAM accesses. This allows for single-shot benchmarking since these metrics should stay nearly identical between runs.
With all that in mind, if you are looking to benchmark your code’s wall clock time then you should probably use Criterion. If you are looking to benchmark your code in CI with shared runners then it may be worth checking out Iai. Note though, Iai hasn’t been update in over 3 years. So you might consider using Iai-Callgrind instead.
But what if you don’t want to benchmark wall clock time or instruction counts? What if you want to track some completely different benchmark‽ Luckily, Rust makes it incredibly easy to create a custom benchmarking harness.
How cargo bench
Works
Before building a custom benchmarking harness,
we need to understand how Rust benchmarks work.
For most Rust developers this means running the cargo bench
command.
The cargo bench
command compiles and executes your benchmarks.
By default, cargo bench
will try to use the built-in (but unstable) libtest bench harness.
libtest bench will then go through your code and run all functions annotated with the #[bench]
attribute.
In order to use a custom benchmarking harness, we need to tell cargo bench
to not use libtest bench.
Use a Custom Benchmarking Harness with cargo bench
To get cargo bench
to not use libtest bench,
we need to add the following to our Cargo.toml
file:
Unfortunately, we can’t use the #[bench]
attribute with our custom benchmarking harness.
Maybe one day soon, but not today.
Instead, we have to create a separate benches
directory to hold our benchmarks.
The benches
directory is to benchmarks
what the tests
directory is to integration tests.
Each file inside of the benches
directory is treated as a separate crate.
The crate being benchmarked must therefore be a library crate.
That is, it must have a lib.rs
file.
For example, if we had a basic library crate named game
then we could add a custom benchmark file named play_game
to the benches
directory.
Our directory structure would look like this:
Next, we need to let cargo bench
know about our custom benchmark crate play_game
.
So we update our Cargo.toml
file:
Write Code to Benchmark
Before we can write a performance test, we need to have some library code to benchmark. For our example, we are going to play the FizzBuzzFibonacci game.
The rules for FizzBuzzFibonacci are as follows:
Write a program that prints the integers from
1
to100
(inclusive):
- For multiples of three, print
Fizz
- For multiples of five, print
Buzz
- For multiples of both three and five, print
FizzBuzz
- For numbers that are part of the Fibonacci sequence, only print
Fibonacci
- For all others, print the number
This is what our implementation looks like in src/lib.rs
:
Create a Custom Benchmarking Harness
We are going to create a custom benchmarking harness inside of benches/play_game.rs
.
This custom benchmarking harness is going to measure heap allocations
using the dhat-rs
crate.
dhat-rs
is a fantastic tool for tracking heap allocations in Rust programs
create by Rust performance expert Nicholas Nethercote.
To help us manage our benchmark functions,
we will be using the inventory
crate
by the astoundingly prolific David Tolnay.
Let’s add dhat-rs
and inventory
to our Cargo.toml
file as dev-dependencies
:
Create a Custom Allocator
Since our custom benchmarking harness will be measuring heap allocations,
we will need to use a custom heap allocator.
Rust allows you to configuring a custom, global heap allocator
using the #[global_allocator]
attribute.
Add the following to the top of benches/play_game.rs
:
This tells Rust to use dhat::Alloc
as our global heap allocator.
🐰 You can only set one global heap allocator at a time. If you want to switch between multiple global allocators, they have to be managed through conditional compilation with Rust features.
Create a Custom Benchmark Collector
To create a custom benchmarking harness,
we need a way identify and store our benchmark functions.
We will use a struct, aptly named CustomBenchmark
to encapsulate each benchmark function.
A CustomBenchmark
has a name and a benchmark function that returns dhat::HeapStats
as its output.
Then we’ll use the inventory
crate to create a collection for all of our CustomBenchmark
s:
Create a Benchmark Function
Now, we can create a benchmark function that plays the FizzBuzzFibonacci game:
Going line by line:
- Create a benchmark function that matches the signature used in
CustomBenchmark
. - Create a
dhat::Profiler
in testing mode, to collect results from ourdhat::Alloc
custom, global allocator. - Run our
play_game
function inside of a “black box” so the compiler doesn’t optimize our code. - Iterate from
1
to100
inclusively. - For each number, call
play_game
, withprint
set tofalse
. - Return our heap allocation stats as
dhat::HeapStats
.
🐰 We set
false
for theplay_game
function. This keepsplay_game
from printing to standard out. Parameterizing your library functions like this can make them more amenable to benchmarking. However, this does mean that we may not be benchmarking the library in exactly the same way that it is used in production.In this case, we have to ask ourselves:
- Are the resources it takes to print to standard out something we care about?
- Is printing to standard out a possible source of noise?
For our example, we’ve gone with:
- No, we don’t care about printing to standard out.
- Yes, it is a very likely source of noise.
Therefore, we have omitted printing to standard out as a part of this benchmark. Benchmarking is hard, and there often isn’t one right answer to questions like these. It depends.
Register the Benchmark Function
With our benchmark function written, we need to
create a CustomBenchmark
and register it with our benchmark collection using inventory
.
If we had more than one benchmark, we would repeat this same process:
- Create a benchmark function.
- Create a
CustomBenchmark
for the benchmark function. - Register the
CustomBenchmark
with theinventory
collection.
Create a Custom Benchmark Runner
Finally, we need to create a runner for our custom benchmark harness. A custom benchmark harness is really just a binary that runs all of our benchmarks for us and reports its results. The benchmark runner is what orchestrates all of that.
We want our results to be output in Bencher Metric Format (BMF) JSON.
To accomplish this, we need to add one final dependency,
the serde_json
crate by… you guessed it, David Tolnay!
Next, we will implement a method for CustomBenchmark
to run its benchmark function
and then return the results as BMF JSON.
The BMF JSON results contain six Measures for each benchmark:
- Final Blocks: Final number of blocks allocated when the benchmark finished.
- Final Bytes: Final number of bytes allocated when the benchmark finished.
- Max Blocks: Maximum number of blocks allocated at one time during the benchmark run.
- Max Bytes: Maximum number of bytes allocated at one time during the benchmark run.
- Total Blocks: Total number of blocks allocated during the benchmark run.
- Total Bytes: Total number of bytes allocated during the benchmark run.
Finally, we can create a main
function to run all of the benchmarks in our inventory
collection
and output the results as BMF JSON.
Run the Custom Benchmark Harness
Everything is now in place. We can finally run our custom benchmark harness.
The output both to standard out
and to a file named results.json
should look like this:
The exact numbers that you see may be a little bit different based on your computer’s architecture. But the important thing is that you at least of some values for the last four metrics.
Track Custom Benchmark Results
Most benchmark results are ephemeral. They disappear as soon as your terminal reaches its scrollback limit. Some benchmark harnesses let you cache results, but that’s a lot of work to implement. And even then, we could only store our results locally. Lucky for us, our custom benchmarking harness will work with Bencher! Bencher is a suite of continuous benchmarking tools that allows us to track the results of our benchmarks over time and catch performance regressions before they make it to production.
Once you are all set up using Bencher Cloud or Bencher Self-Hosted, you can track the results from our custom benchmarking harness by running:
You can also read more about how to track custom benchmarks with Bencher and the JSON benchmark adapter.
Wrap Up
We started this post looking at the three most popular benchmarking harnesses in the Rust ecosystem: libtest bench, Criterion, and Iai. Even though they may cover the majority of the use cases, sometimes you may need to measure something other than wall clock time or instruction counts. This set us down the road to creating a custom benchmarking harness.
Our custom benchmarking harness measures heap allocations using dhat-rs
.
The benchmark functions were collected using inventory
.
When run, our benchmarks output results as Bencher Metric Format (BMF) JSON.
We could then use Bencher to track our custom benchmark results over time
and catch performance regressions in CI.
All of the source code for this guide is available on GitHub.
Bencher: Continuous Benchmarking
Bencher is a suite of continuous benchmarking tools. Have you ever had a performance regression impact your users? Bencher could have prevented that from happening. Bencher allows you to detect and prevent performance regressions before they make it to production.
- Run: Run your benchmarks locally or in CI using your favorite benchmarking tools. The
bencher
CLI simply wraps your existing benchmark harness and stores its results. - Track: Track the results of your benchmarks over time. Monitor, query, and graph the results using the Bencher web console based on the source branch, testbed, benchmark, and measure.
- Catch: Catch performance regressions in CI. Bencher uses state of the art, customizable analytics to detect performance regressions before they make it to production.
For the same reasons that unit tests are run in CI to prevent feature regressions, benchmarks should be run in CI with Bencher to prevent performance regressions. Performance bugs are bugs!
Start catching performance regressions in CI — try Bencher Cloud for free.