How to build a Custom Benchmarking Harness in Rust

Everett Pompeii

What is Benchmarking?

Benchmarking is the practice of testing the performance of your code to see how fast (latency) or how much work (throughput) it can do. This often overlooked step in software development is crucial for creating and maintaining fast and performant code. Benchmarking provides the necessary metrics for developers to understand how well their code performs under various workloads and conditions. For the same reasons that you write unit and integration tests to prevent feature regressions, you should write benchmarks to prevent performance regressions. Performance bugs are bugs!

Benchmarking in Rust

The three popular options for benchmarking in Rust are: libtest bench, Criterion, and Iai.

libtest is Rust’s built-in unit testing and benchmarking framework. Though part of the Rust standard library, libtest bench is still considered unstable, so it is only available on nightly compiler releases. To work on the stable Rust compiler, a separate benchmarking harness needs to be used. Neither is being actively developed, though.

The most popular benchmarking harness within the Rust ecosystem is Criterion. It works on both stable and nightly Rust compiler releases, and it has become the de facto standard within the Rust community. Criterion is also much more feature-rich compared to libtest bench.

An experimental alternative to Criterion is Iai, from the same creator as Criterion. However, it uses instruction counts instead of wall clock time: CPU instructions, L1 accesses, L2 access and RAM accesses. This allows for single-shot benchmarking since these metrics should stay nearly identical between runs.

With all that in mind, if you are looking to benchmark your code’s wall clock time then you should probably use Criterion. If you are looking to benchmark your code in CI with shared runners then it may be worth checking out Iai. Note though, Iai hasn’t been update in over 3 years. So you might consider using Iai-Callgrind instead.

But what if you don’t want to benchmark wall clock time or instruction counts? What if you want to track some completely different benchmark‽ Luckily, Rust makes it incredibly easy to create a custom benchmarking harness.

How `cargo bench` Works

Before building a custom benchmarking harness, we need to understand how Rust benchmarks work. For most Rust developers this means running the cargo bench command. The cargo bench command compiles and executes your benchmarks. By default, cargo bench will try to use the built-in (but unstable) libtest bench harness. libtest bench will then go through your code and run all functions annotated with the #[bench] attribute. In order to use a custom benchmarking harness, we need to tell cargo bench to not use libtest bench.

Use a Custom Benchmarking Harness with `cargo bench`

To get cargo bench to not use libtest bench, we need to add the following to our Cargo.toml file:

[[bench]]
harness = false

Unfortunately, we can’t use the #[bench] attribute with our custom benchmarking harness. Maybe one day soon, but not today. Instead, we have to create a separate benches directory to hold our benchmarks. The benches directory is to benchmarks what the tests directory is to integration tests. Each file inside of the benches directory is treated as a separate crate. The crate being benchmarked must therefore be a library crate. That is, it must have a lib.rs file.

For example, if we had a basic library crate named game then we could add a custom benchmark file named play_game to the benches directory. Our directory structure would look like this:

game
├── Cargo.lock
├── Cargo.toml
└── benches
    └── play_game.rs
└── src
    └── lib.rs

Next, we need to let cargo bench know about our custom benchmark crate play_game. So we update our Cargo.toml file:

[[bench]]
name = "play_game"
harness = false

Write Code to Benchmark

Before we can write a performance test, we need to have some library code to benchmark. For our example, we are going to play the FizzBuzzFibonacci game.

The rules for FizzBuzzFibonacci are as follows:

Write a program that prints the integers from 1 to 100 (inclusive):

For multiples of three, print Fizz

For multiples of five, print Buzz

For multiples of both three and five, print FizzBuzz

For numbers that are part of the Fibonacci sequence, only print Fibonacci

For all others, print the number

This is what our implementation looks like in src/lib.rs:

pub fn play_game(n: u32, print: bool) {
    let result = fizz_buzz_fibonacci(n);
    if print {
        println!("{result}");
    }
}

fn fizz_buzz_fibonacci(n: u32) -> String {
    if is_fibonacci_number(n) {
        "Fibonacci".to_string()
    } else {
        match (n % 3, n % 5) {
            (0, 0) => "FizzBuzz".to_string(),
            (0, _) => "Fizz".to_string(),
            (_, 0) => "Buzz".to_string(),
            (_, _) => n.to_string(),
        }
    }
}

fn is_fibonacci_number(n: u32) -> bool {
    let (mut previous, mut current) = (0, 1);
    while current < n {
        let next = previous + current;
        previous = current;
        current = next;
    }
    current == n
}

Create a Custom Benchmarking Harness

We are going to create a custom benchmarking harness inside of benches/play_game.rs. This custom benchmarking harness is going to measure heap allocations using the dhat-rs crate. dhat-rs is a fantastic tool for tracking heap allocations in Rust programs create by Rust performance expert Nicholas Nethercote. To help us manage our benchmark functions, we will be using the inventory crate by the astoundingly prolific David Tolnay.

Let’s add dhat-rs and inventory to our Cargo.toml file as dev-dependencies:

[dev-dependencies]
dhat = "0.3"
inventory = "0.3"

Create a Custom Allocator

Since our custom benchmarking harness will be measuring heap allocations, we will need to use a custom heap allocator. Rust allows you to configuring a custom, global heap allocator using the #[global_allocator] attribute. Add the following to the top of benches/play_game.rs:

#[global_allocator]
static ALLOC: dhat::Alloc = dhat::Alloc;

This tells Rust to use dhat::Alloc as our global heap allocator.

🐰 You can only set one global heap allocator at a time. If you want to switch between multiple global allocators, they have to be managed through conditional compilation with Rust features.

Create a Custom Benchmark Collector

To create a custom benchmarking harness, we need a way identify and store our benchmark functions. We will use a struct, aptly named CustomBenchmark to encapsulate each benchmark function.

#[derive(Debug)]
struct CustomBenchmark {
    name: &'static str,
    benchmark_fn: fn() -> dhat::HeapStats,
}

A CustomBenchmark has a name and a benchmark function that returns dhat::HeapStats as its output.

Then we’ll use the inventory crate to create a collection for all of our CustomBenchmarks:

#[derive(Debug)]
struct CustomBenchmark {
    name: &'static str,
    benchmark_fn: fn() -> dhat::HeapStats,
}
inventory::collect!(CustomBenchmark);

Create a Benchmark Function

Now, we can create a benchmark function that plays the FizzBuzzFibonacci game:

fn bench_play_game() -> dhat::HeapStats {
    let _profiler = dhat::Profiler::builder().testing().build();

    std::hint::black_box(for i in 1..=100 {
        play_game(i, false);
    });

    dhat::HeapStats::get()
}

Going line by line:

Create a benchmark function that matches the signature used in CustomBenchmark.
Create a dhat::Profiler in testing mode, to collect results from our dhat::Alloc custom, global allocator.
Run our play_game function inside of a “black box” so the compiler doesn’t optimize our code.
Iterate from 1 to 100 inclusively.
For each number, call play_game, with print set to false.
Return our heap allocation stats as dhat::HeapStats.

🐰 We set print to false for the play_game function. This keeps play_game from printing to standard out. Parameterizing your library functions like this can make them more amenable to benchmarking. However, this does mean that we may not be benchmarking the library in exactly the same way that it is used in production.

In this case, we have to ask ourselves:

Are the resources it takes to print to standard out something we care about?

Is printing to standard out a possible source of noise?

For our example, we’ve gone with:

No, we don’t care about printing to standard out.

Yes, it is a very likely source of noise.

Therefore, we have omitted printing to standard out as a part of this benchmark. Benchmarking is hard, and there often isn’t one right answer to questions like these. It depends.

Register the Benchmark Function

With our benchmark function written, we need to create a CustomBenchmark and register it with our benchmark collection using inventory.

fn bench_play_game() -> dhat::HeapStats {
    let _profiler = dhat::Profiler::builder().testing().build();

    std::hint::black_box(for i in 1..=100 {
        play_game(i, false);
    });

    dhat::HeapStats::get()
}
inventory::submit!(CustomBenchmark {
    name: "bench_play_game",
    benchmark_fn: bench_play_game
});

If we had more than one benchmark, we would repeat this same process:

Create a benchmark function.
Create a CustomBenchmark for the benchmark function.
Register the CustomBenchmark with the inventory collection.

Create a Custom Benchmark Runner

Finally, we need to create a runner for our custom benchmark harness. A custom benchmark harness is really just a binary that runs all of our benchmarks for us and reports its results. The benchmark runner is what orchestrates all of that.

We want our results to be output in Bencher Metric Format (BMF) JSON. To accomplish this, we need to add one final dependency, the serde_json crate by… you guessed it, David Tolnay!

[dev-dependencies]
dhat = "0.3"
inventory = "0.3"
serde_json = "1.0"

Next, we will implement a method for CustomBenchmark to run its benchmark function and then return the results as BMF JSON.

impl CustomBenchmark {
    fn run(&self) -> serde_json::Value {
        let heap_stats = (self.benchmark_fn)();
        let measures = serde_json::json!({
            "Final Blocks": {
                "value": heap_stats.curr_blocks,
            },
            "Final Bytes": {
                "value": heap_stats.curr_bytes,
            },
            "Max Blocks": {
                "value": heap_stats.max_blocks,
            },
            "Max Bytes": {
                "value": heap_stats.max_bytes,
            },
            "Total Blocks": {
                "value": heap_stats.total_blocks,
            },
            "Total Bytes": {
                "value": heap_stats.total_bytes,
            },
        });
        let mut benchmark_map = serde_json::Map::new();
        benchmark_map.insert(self.name.to_string(), measures);
        benchmark_map.into()
    }
}

The BMF JSON results contain six Measures for each benchmark:

Final Blocks: Final number of blocks allocated when the benchmark finished.
Final Bytes: Final number of bytes allocated when the benchmark finished.
Max Blocks: Maximum number of blocks allocated at one time during the benchmark run.
Max Bytes: Maximum number of bytes allocated at one time during the benchmark run.
Total Blocks: Total number of blocks allocated during the benchmark run.
Total Bytes: Total number of bytes allocated during the benchmark run.

Finally, we can create a main function to run all of the benchmarks in our inventory collection and output the results as BMF JSON.

fn main() {
    let mut bmf = serde_json::Map::new();

    for benchmark in inventory::iter::<CustomBenchmark> {
        let mut results = benchmark.run();
        bmf.append(results.as_object_mut().unwrap());
    }

    let bmf_str = serde_json::to_string_pretty(&bmf).unwrap();
    std::fs::write("results.json", &bmf_str).unwrap();
    println!("{bmf_str}");
}

Run the Custom Benchmark Harness

Everything is now in place. We can finally run our custom benchmark harness.

cargo bench

The output both to standard out and to a file named results.json should look like this:

{
  "bench_play_game": {
    "Current Blocks": {
      "value": 0
    },
    "Current Bytes": {
      "value": 0
    },
    "Max Blocks": {
      "value": 1
    },
    "Max Bytes": {
      "value": 9
    },
    "Total Blocks": {
      "value": 100
    },
    "Total Bytes": {
      "value": 662
    }
  }
}

The exact numbers that you see may be a little bit different based on your computer’s architecture. But the important thing is that you at least of some values for the last four metrics.

Track Custom Benchmark Results

Most benchmark results are ephemeral. They disappear as soon as your terminal reaches its scrollback limit. Some benchmark harnesses let you cache results, but that’s a lot of work to implement. And even then, we could only store our results locally. Lucky for us, our custom benchmarking harness will work with Bencher! Bencher is a suite of continuous benchmarking tools that allows us to track the results of our benchmarks over time and catch performance regressions before they make it to production.

Once you are all set up using Bencher Cloud or Bencher Self-Hosted, you can track the results from our custom benchmarking harness by running:

bencher run --file results.json "cargo bench"

You can also read more about how to track custom benchmarks with Bencher and the JSON benchmark adapter.

Wrap Up

We started this post looking at the three most popular benchmarking harnesses in the Rust ecosystem: libtest bench, Criterion, and Iai. Even though they may cover the majority of the use cases, sometimes you may need to measure something other than wall clock time or instruction counts. This set us down the road to creating a custom benchmarking harness.

Our custom benchmarking harness measures heap allocations using dhat-rs. The benchmark functions were collected using inventory. When run, our benchmarks output results as Bencher Metric Format (BMF) JSON. We could then use Bencher to track our custom benchmark results over time and catch performance regressions in CI.

All of the source code for this guide is available on GitHub.

Bencher: Continuous Benchmarking

Bencher is a suite of continuous benchmarking tools. Have you ever had a performance regression impact your users? Bencher could have prevented that from happening. Bencher allows you to detect and prevent performance regressions before they make it to production.

Run: Run your benchmarks locally or in CI using your favorite benchmarking tools. The bencher CLI simply wraps your existing benchmark harness and stores its results.
Track: Track the results of your benchmarks over time. Monitor, query, and graph the results using the Bencher web console based on the source branch, testbed, benchmark, and measure.
Catch: Catch performance regressions in CI. Bencher uses state of the art, customizable analytics to detect performance regressions before they make it to production.

For the same reasons that unit tests are run in CI to prevent feature regressions, benchmarks should be run in CI with Bencher to prevent performance regressions. Performance bugs are bugs!

Start catching performance regressions in CI — try Bencher Cloud for free.

Published: Tue, June 11, 2024 at 7:23:00 AM UTC