How to benchmark Rust code with Criterion
Everett Pompeii
What is Benchmarking?
Benchmarking is the practice of testing the performance of your code to see how fast (latency) or how much work (throughput) it can do. This often overlooked step in software development is crucial for creating and maintaining fast and performant code. Benchmarking provides the necessary metrics for developers to understand how well their code performs under various workloads and conditions. For the same reasons that you write unit and integration tests to prevent feature regressions, you should write benchmarks to prevent performance regressions. Performance bugs are bugs!
Write FizzBuzz in Rust
In order to write benchmarks, we need some source code to benchmark. To start off we are going to write a very simple program, FizzBuzz.
The rules for FizzBuzz are as follows:
Write a program that prints the integers from
1
to100
(inclusive):
- For multiples of three, print
Fizz
- For multiples of five, print
Buzz
- For multiples of both three and five, print
FizzBuzz
- For all others, print the number
There are many ways to write FizzBuzz. So we’ll go with the my favorite:
- Create a
main
function - Iterate from
1
to100
inclusively. - For each number, calculate the modulus (remainder after division) for both
3
and5
. - Pattern match on the two remainders.
If the remainder is
0
, then the number is a multiple of the given factor. - If the remainder is
0
for both3
and5
then printFizzBuzz
. - If the remainder is
0
for only3
then printFizz
. - If the remainder is
0
for only5
then printBuzz
. - Otherwise, just print the number.
Follow Step-by-Step
In order to follow along with this set-by-step tutorial, you will need to install Rust.
🐰 The source code for this post is available on GitHub.
With Rust installed, you can then open a terminal window and enter: cargo init game
Then navigate into the newly created game
directory.
You should see a directory called src
with file named main.rs
:
Replace its contents with the above FizzBuzz implementation. Then run cargo run
.
The output should look like:
🐰 Boom! You’re cracking the coding interview!
A new Cargo.lock
file should have been generated:
Before going any further, it is important to discuss the differences between micro-benchmarking and macro-benchmarking.
Micro-Benchmarking vs Macro-Benchmarking
There are two major categories of software benchmarks: micro-benchmarks and macro-benchmarks.
Micro-benchmarks operate at a level similar to unit tests.
For example, a benchmark for a function that determines Fizz
, Buzz
, or FizzBuzz
for a single number would be a micro-benchmark.
Macro-benchmarks operate at a level similar to integration tests.
For example, a benchmark for a function that plays the entire game of FizzBuzz, from 1
to 100
, would be a macro-benchmark.
Generally, it is best to test at the lowest level of abstraction possible. In the case benchmarks, this makes them both easier to maintain, and it helps to reduce the amount of noise in the measurements. However, just as having some end-to-end tests can be very useful for sanity checking the entire system comes together as expected, having macro-benchmarks can be very useful for making sure that the critical paths through your software remain performant.
Benchmarking in Rust
The three popular options for benchmarking in Rust are: libtest bench, Criterion, and Iai.
libtest is Rust’s built-in unit testing and benchmarking framework.
Though part of the Rust standard library, libtest bench is still considered unstable,
so it is only available on nightly
compiler releases.
To work on the stable Rust compiler,
a separate benchmarking harness
needs to be used.
Neither is being actively developed, though.
The most popular benchmarking harness within the Rust ecosystem is Criterion.
It works on both stable and nightly
Rust compiler releases,
and it has become the de facto standard within the Rust community.
Criterion is also much more feature-rich compared to libtest bench.
An experimental alternative to Criterion is Iai, from the same creator as Criterion. However, it uses instruction counts instead of wall clock time: CPU instructions, L1 accesses, L2 access and RAM accesses. This allows for single-shot benchmarking since these metrics should stay nearly identical between runs.
All three are support by Bencher. So why choose Criterion? Criterion is the de facto standard benchmarking harness in the Rust community. I would suggest using Criterion for benchmarking your code’s latency. That is, Criterion is great for measuring wall clock time.
Refactor FizzBuzz
In order to test our FizzBuzz application, we need to decouple our logic from our program’s main
function.
Benchmark harnesses can’t benchmark the main
function. In order to do this, we need to make few changes.
Under src
, create a new file named lib.rs
:
Add the following code to lib.rs
:
play_game
: Takes in an unsigned integern
, callsfizz_buzz
with that number, and ifprint
istrue
print the result.fizz_buzz
: Takes in an unsigned integern
and performs the actualFizz
,Buzz
,FizzBuzz
, or number logic returning the result as a string.
Then update main.rs
to look like this:
game::play_game
: Importplay_game
from thegame
crate we just created withlib.rs
.main
: The main entrypoint into our program that iterates through the numbers1
to100
inclusive and callsplay_game
for each number, withprint
set totrue
.
Benchmarking FizzBuzz
In order to benchmark our code, we need to create a benches
directory and add file to contain our benchmarks, play_game.rs
:
Inside of play_game.rs
add the following code:
- Import the
Criterion
benchmark runner. - Import the
play_game
function from ourgame
crate. - Create a function named
bench_play_game
that takes in a mutable reference toCriterion
. - Use the
Criterion
instance (c
) to create a benchmark namedbench_play_game
. - Then use the benchmark runner (
b
) to run our macro-benchmark several times. - Run our macro-benchmark inside of a “black box” so the compiler doesn’t optimize our code.
- Iterate from
1
to100
inclusively. - For each number, call
play_game
, withprint
set tofalse
.
Now we need to configure the game
crate to run our benchmarks.
Add the following to the bottom of your Cargo.toml
file:
criterion
: Addcriterion
as a development dependency, since we are only using it for performance testing.bench
: Registerplay_game
as a benchmark and setharness
tofalse
, since we will be using Criterion as our benchmarking harness.
Now we’re ready to benchmark our code, run cargo bench
:
🐰 Lettuce turnip the beet! We’ve got our first benchmark metrics!
Finally, we can rest our weary developer heads… Just kidding, our users want a new feature!
Write FizzBuzzFibonacci in Rust
Our Key Performance Indicators (KPIs) are down, so our Product Manager (PM) wants us to add a new feature. After much brainstorming and many user interviews, it is decided that good ole FizzBuzz isn’t enough. Kids these days want a new game, FizzBuzzFibonacci.
The rules for FizzBuzzFibonacci are as follows:
Write a program that prints the integers from
1
to100
(inclusive):
- For multiples of three, print
Fizz
- For multiples of five, print
Buzz
- For multiples of both three and five, print
FizzBuzz
- For numbers that are part of the Fibonacci sequence, only print
Fibonacci
- For all others, print the number
The Fibonacci sequence is a sequence in which each number is the sum of the two preceding numbers.
For example, starting at 0
and 1
the next number in the Fibonacci sequence would be 1
.
Followed by: 2
, 3
, 5
, 8
and so on.
Numbers that are part of the Fibonacci sequence are known as Fibonacci numbers. So we’re going to have to write a function that detects Fibonacci numbers.
There are many ways to write the Fibonacci sequence and likewise many ways to detect a Fibonacci number. So we’ll go with the my favorite:
- Create a function named
is_fibonacci_number
that takes in an unsigned integer and returns a boolean. - Iterate for all number from
0
to our given numbern
inclusive. - Initialize our Fibonacci sequence starting with
0
and1
as theprevious
andcurrent
numbers respectively. - Iterate while the
current
number is less than the current iterationi
. - Add the
previous
andcurrent
number to get thenext
number. - Update the
previous
number to thecurrent
number. - Update the
current
number to thenext
number. - Once
current
is greater than or equal to the given numbern
, we will exit the loop. - Check to see is the
current
number is equal to the given numbern
and if so returntrue
. - Otherwise, return
false
.
Now we will need to update our fizz_buzz
function:
- Rename the
fizz_buzz
function tofizz_buzz_fibonacci
to make it more descriptive. - Call our
is_fibonacci_number
helper function. - If the result from
is_fibonacci_number
istrue
then returnFibonacci
. - If the result from
is_fibonacci_number
isfalse
then perform the sameFizz
,Buzz
,FizzBuzz
, or number logic returning the result.
Because we renamed fizz_buzz
to fizz_buzz_fibonacci
we also need to update our play_game
function:
Both our main
and bench_play_game
functions can stay exactly the same.
Benchmarking FizzBuzzFibonacci
Now we can rerun our benchmark:
Oh, neat! Criterion tells us the difference between the performance of our FizzBuzz and FizzBuzzFibonacci games is +568.69%
.
Your numbers will be a little different than mine.
However, the difference between the two games is likely in the 5x
range.
That seems good to me! Especially for adding a feature as fancy sounding as Fibonacci to our game.
The kids will love it!
Expand FizzBuzzFibonacci in Rust
Our game is a hit! The kids do indeed love playing FizzBuzzFibonacci.
So much so that word has come down from the execs that they want a sequel.
But this is the modern world, we need Annual Recurring Revenue (ARR) not one time purchases!
The new vision for our game is that it is open ended, no more living between the bounds of 1
and 100
(even if they are inclusive).
No, we’re on to new frontiers!
The rules for Open World FizzBuzzFibonacci are as follows:
Write a program that takes in any positive integer and prints:
- For multiples of three, print
Fizz
- For multiples of five, print
Buzz
- For multiples of both three and five, print
FizzBuzz
- For numbers that are part of the Fibonacci sequence, only print
Fibonacci
- For all others, print the number
In order to have our game work for any number, we will need to accept a command line argument.
Update the main
function to look like this:
- Collect all of the arguments (
args
) passed to our game from the command line. - Get the first argument passed to our game and parse it as an unsigned integer
i
. - If parsing fails or no argument is passed in, default to playing our game with
15
as the input. - Finally, play our game with the newly parsed unsigned integer
i
.
Now we can play our game with any number!
Use cargo run
followed by --
to pass arguments to our game:
And if we omit or provide an invalid number:
Wow, that was some thorough testing! CI passes. Our bosses are thrilled. Let’s ship it! 🚀
The End
🐰 … the end of your career maybe?
Just kidding! Everything is on fire! 🔥
Well, at first everything seemed to be going fine. And then at 02:07 AM on Saturday my pager went off:
📟 Your game is on fire! 🔥
After scrambling out of bed, I tried to figure out what was going on. I tried to search through the logs, but that was hard because everything kept crashing. Finally, I found the issue. The kids! They loved our game so much, they were playing it all the way up to a million! In a flash of brilliance, I added two new benchmarks:
- A micro-benchmark
bench_play_game_100
for playing the game with the number one hundred (100
) - A micro-benchmark
bench_play_game_1_000_000
for playing the game with the number one million (1_000_000
)
When I ran it, I got this:
Wait for it… wait for it…
What! 403.57 ns
x 1,000
should be 403,570 ns
not 9,596,800 ns
(9.5968 ms
x 1_000_000 ns/1 ms
) 🤯
Even though I got my Fibonacci sequence code functionally correct, I must have a performance bug in there somewhere.
Fix FizzBuzzFibonacci in Rust
Let’s take another look at that is_fibonacci_number
function:
Now that I’m thinking about performance, I do realize that I have an unnecessary, extra loop.
We can completely get rid of the for i in 0..=n {}
loop and
just compare the current
value to the given number (n
) 🤦
- Update our
is_fibonacci_number
function. - Initialize our Fibonacci sequence starting with
0
and1
as theprevious
andcurrent
numbers respectively. - Iterate while the
current
number is less than the given numbern
. - Add the
previous
andcurrent
number to get thenext
number. - Update the
previous
number to thecurrent
number. - Update the
current
number to thenext
number. - Once
current
is greater than or equal to the given numbern
, we will exit the loop. - Check to see if the
current
number is equal to the given numbern
and return that result.
Now lets rerun those benchmarks and see how we did:
Oh, wow! Our bench_play_game
benchmark is back down to around where it was for the original FizzBuzz.
I wish I could remember exactly what that score was. It’s been three weeks though.
My terminal history doesn’t go back that far.
And Criterion only compares against the most recent result.
But I think it’s close!
The bench_play_game_100
benchmark is down nearly 10x, -93.950%
.
And the bench_play_game_1_000_000
benchmark is down more than 10,000x! 9,596,800 ns
to 30.403 ns
!
We even maxed out Criterion’s change meter, which only goes up to -100.000%
!
🐰 Hey, at least we caught this performance bug before it made it to production… oh, right. Nevermind…
Catch Performance Regressions in CI
The execs weren’t happy about the deluge of negative reviews our game received due to my little performance bug. They told me not to let it happen again, and when I asked how, they just told me not to do it again. How am I supposed to manage that‽
Luckily, I’ve found this awesome open source tool called Bencher. There’s a super generous free tier, so I can just use Bencher Cloud for my personal projects. And at work where everything needs to be in our private cloud, I’ve started using Bencher Self-Hosted.
Bencher has a built-in adapters, so it’s easy to integrate into CI. After following the Quick Start guide, I’m able to run my benchmarks and track them with Bencher.
Using this nifty time travel device that a nice rabbit gave me, I was able to go back in time and replay what would have happened if we were using Bencher all along. You can see where we first pushed the buggy FizzBuzzFibonacci implementation. I immediately got failures in CI as a comment on my pull request. That same day, I fixed the performance bug, getting rid of that needless, extra loop. No fires. Just happy users.
Bencher: Continuous Benchmarking
Bencher is a suite of continuous benchmarking tools. Have you ever had a performance regression impact your users? Bencher could have prevented that from happening. Bencher allows you to detect and prevent performance regressions before they make it to production.
- Run: Run your benchmarks locally or in CI using your favorite benchmarking tools. The
bencher
CLI simply wraps your existing benchmark harness and stores its results. - Track: Track the results of your benchmarks over time. Monitor, query, and graph the results using the Bencher web console based on the source branch, testbed, benchmark, and measure.
- Catch: Catch performance regressions in CI. Bencher uses state of the art, customizable analytics to detect performance regressions before they make it to production.
For the same reasons that unit tests are run in CI to prevent feature regressions, benchmarks should be run in CI with Bencher to prevent performance regressions. Performance bugs are bugs!
Start catching performance regressions in CI — try Bencher Cloud for free.