How to benchmark C++ code with Google Benchmark

Everett Pompeii

Everett Pompeii


What is Benchmarking?

Benchmarking is the practice of testing the performance of your code to see how fast (latency) or how much work (throughput) it can do. This often overlooked step in software development is crucial for creating and maintaining fast and performant code. Benchmarking provides the necessary metrics for developers to understand how well their code performs under various workloads and conditions. For the same reasons that you write unit and integration tests to prevent feature regressions, you should write benchmarks to prevent performance regressions. Performance bugs are bugs!

Write FizzBuzz in C++

In order to write benchmarks, we need some source code to benchmark. To start off we are going to write a very simple program, FizzBuzz.

The rules for FizzBuzz are as follows:

Write a program that prints the integers from 1 to 100 (inclusive):

  • For multiples of three, print Fizz
  • For multiples of five, print Buzz
  • For multiples of both three and five, print FizzBuzz
  • For all others, print the number

There are many ways to write FizzBuzz. So we’ll go with the my favorite:

#include <iostream>
int main()
{
for (int i = 1; i <= 100; i++)
{
if ((i % 15) == 0)
std::cout << "FizzBuzz\n";
else if ((i % 3) == 0)
std::cout << "Fizz\n";
else if ((i % 5) == 0)
std::cout << "Buzz\n";
else
std::cout << i << "\n";
}
return 0;
}
  • Iterate from 1 to 100, incrementing after every iteration.
  • For each number, calculate the modulus (remainder after division).
  • If the remainder is 0, then the number is a multiple of the given factor:
    • If the remainder is 0 for 15, then print FizzBuzz.
    • If the remainder is 0 for 3, then print Fizz.
    • If the remainder is 0 for 5 ,then print Buzz.
  • Otherwise, just print the number.

Follow Step-by-Step

In order to follow along with this set-by-step tutorial, you will need to install git, install cmake, and install the GNU Compiler Collection (GCC) g++.

🐰 The source code for this post is available on GitHub.

Create a C++ file named game.cpp, and set its contents to the above FizzBuzz implementation.

Use g++ to build an executable named game and then run it. The output should look like:

$ g++ -std=c++11 game.cpp -o game && ./game
1
2
Fizz
4
Buzz
Fizz
7
8
Fizz
Buzz
11
Fizz
13
14
FizzBuzz
...
97
98
Fizz
Buzz

🐰 Boom! You’re cracking the coding interview!

Before going any further, it is important to discuss the differences between micro-benchmarking and macro-benchmarking.

Micro-Benchmarking vs Macro-Benchmarking

There are two major categories of software benchmarks: micro-benchmarks and macro-benchmarks. Micro-benchmarks operate at a level similar to unit tests. For example, a benchmark for a function that determines Fizz, Buzz, or FizzBuzz for a single number would be a micro-benchmark. Macro-benchmarks operate at a level similar to integration tests. For example, a benchmark for a function that plays the entire game of FizzBuzz, from 1 to 100, would be a macro-benchmark.

Generally, it is best to test at the lowest level of abstraction possible. In the case benchmarks, this makes them both easier to maintain, and it helps to reduce the amount of noise in the measurements. However, just as having some end-to-end tests can be very useful for sanity checking the entire system comes together as expected, having macro-benchmarks can be very useful for making sure that the critical paths through your software remain performant.

Benchmarking in C++

The two popular options for benchmarking in C++ are: Google Benchmark and Catch2.

Google Benchmark is a robust and versatile benchmarking library for C++ that allows developers to measure the performance of their code with high precision. One of its key benefits is its ease of integration into existing projects, especially those that already use GoogleTest. Google Benchmark provides detailed performance metrics, including the ability to measure CPU time, wall time, and memory usage. It supports a wide range of benchmarking scenarios, from simple function benchmarks to complex, parameterized tests.

Catch2 is a modern, header-only testing framework for C++ that simplifies the process of writing and running tests. One of its primary benefits is its ease of use, with a syntax that is both intuitive and expressive, allowing developers to write tests quickly and clearly. Catch2 supports a wide range of test types, including unit tests, integration tests, behavior-driven development (BDD) style tests, and basic micro-benchmarking features.

Both are support by Bencher. So why choose Google Benchmark? Google Benchmark integrates seamlessly with GoogleTest, which is the de facto standard unit test harness in the C++ ecosystem. I would suggest using Google Benchmark for benchmarking your code’s latency, especially if you are already using GoogleTest. That is, Google Benchmark is great for measuring wall clock time.

Refactor FizzBuzz

In order to test our FizzBuzz application, we need to decouple our logic from our program’s main function. Benchmark harnesses can’t benchmark the main function. In order to do this, we need to make a few changes.

Let’s refactor our FizzBuzz logic into a couple of functions inside of new file named play_game.cpp:

play_game.cpp
#include <iostream>
#include <string>
std::string fizz_buzz(int n) {
if (n % 15 == 0) {
return "FizzBuzz";
} else if (n % 3 == 0) {
return "Fizz";
} else if (n % 5 == 0) {
return "Buzz";
} else {
return std::to_string(n);
}
}
void play_game(int n, bool should_print) {
std::string result = fizz_buzz(n);
if (should_print) {
std::cout << result << std::endl;
}
}
  • fizz_buzz: Takes in an integer n and performs the actual Fizz, Buzz, FizzBuzz, or number logic returning the result as a string.
  • play_game: Takes in an integer n, calls fizz_buzz with that number, and if should_print is true print the result.

Now, let’s create a header file named play_game.h and add the play_game function declaration to it:

play_game.h
#ifndef GAME_H
#define GAME_H
#include <string>
void play_game(int n, bool should_print);
#endif // GAME_H

Then update the main function in game.cpp to use the play_game function definition from the header file:

game.cpp
#include "play_game.h"
int main()
{
for (int i = 1; i <= 100; i++)
{
play_game(i, true);
}
}

The main function for our program iterates through the numbers 1 to 100 inclusive and calls play_game for each number, with should_print set to true.

Benchmarking FizzBuzz

In order to benchmark our code, we need to first install Google Benchmark.

Clone the library:

$ git clone https://github.com/google/benchmark.git

Enter the newly cloned directory:

$ cd benchmark

Use cmake to create a build directory to place the build output:

$ cmake -E make_directory "build"

Use cmake to generate build system files with and download any dependencies:

$ cmake -E chdir "build" cmake -DBENCHMARK_DOWNLOAD_DEPENDENCIES=on -DCMAKE_BUILD_TYPE=Release ../

Finally, build the library:

$ cmake --build "build" --config Release

Return back to the parent directory:

cd ..

Now let’s create a new file named benchmark_game.cpp:

benchmark_game.cpp
#include "play_game.h"
#include <benchmark/benchmark.h>
#include <iostream>
static void BENCHMARK_game(benchmark::State &state)
{
for (auto _ : state)
{
for (int i = 1; i <= 100; i++)
{
play_game(i, false);
}
}
}
BENCHMARK(BENCHMARK_game);
BENCHMARK_MAIN();
  • Import the function definitions from play_game.h.
  • Import the Google benchmark library header.
  • Create a function named BENCHMARK_game that takes in a reference to benchmark::State.
  • Iterate over the benchmark::State object.
  • For each iteration, iterate from 1 to 100 inclusively.
    • Call play_game with the current number and should_print set to false.
  • Pass the BENCHMARK_game function the the BENCHMARK runner.
  • Run the benchmark with BENCHMARK_MAIN.

Now we’re ready to benchmark our code:

$ g++ -std=c++11 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread play_game.cpp benchmark_game.cpp -o benchmark_game && ./benchmark_game
2023-10-16T14:00:00-04:00
Running ./benchmark_game
Run on (8 X 24 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x8)
Load Average: 5.55, 4.62, 4.69
---------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------
BENCHMARK_game 1698 ns 1688 ns 419979

🐰 Lettuce turnip the beet! We’ve got our first benchmark metrics!

Finally, we can rest our weary developer heads… Just kidding, our users want a new feature!

Write FizzBuzzFibonacci in C++

Our Key Performance Indicators (KPIs) are down, so our Product Manager (PM) wants us to add a new feature. After much brainstorming and many user interviews, it is decided that good ole FizzBuzz isn’t enough. Kids these days want a new game, FizzBuzzFibonacci.

The rules for FizzBuzzFibonacci are as follows:

Write a program that prints the integers from 1 to 100 (inclusive):

  • For multiples of three, print Fizz
  • For multiples of five, print Buzz
  • For multiples of both three and five, print FizzBuzz
  • For numbers that are part of the Fibonacci sequence, only print Fibonacci
  • For all others, print the number

The Fibonacci sequence is a sequence in which each number is the sum of the two preceding numbers. For example, starting at 0 and 1 the next number in the Fibonacci sequence would be 1. Followed by: 2, 3, 5, 8 and so on. Numbers that are part of the Fibonacci sequence are known as Fibonacci numbers. So we’re going to have to write a function that detects Fibonacci numbers.

There are many ways to write the Fibonacci sequence and likewise many ways to detect a Fibonacci number. So we’ll go with the my favorite:

play_game.cpp
bool is_fibonacci_number(int n)
{
for (int i = 0; i <= n; ++i)
{
int previous = 0, current = 1;
while (current < i)
{
int next = previous + current;
previous = current;
current = next;
}
if (current == n)
{
return true;
}
}
return false;
}
  • Create a function named is_fibonacci_number that takes in an integer and returns a boolean.
  • Iterate for all number from 0 to our given number n inclusive.
  • Initialize our Fibonacci sequence starting with 0 and 1 as the previous and current numbers respectively.
  • Iterate while the current number is less than the current iteration i.
  • Add the previous and current number to get the next number.
  • Update the previous number to the current number.
  • Update the current number to the next number.
  • Once current is greater than or equal to the given number n, we will exit the loop.
  • Check to see is the current number is equal to the given number n and if so return true.
  • Otherwise, return false.

Now we will need to update our fizz_buzz function:

play_game.cpp
std::string fizz_buzz_fibonacci(int n)
{
if (is_fibonacci_number(n))
{
return "Fibonacci";
}
else if (n % 15 == 0)
{
return "FizzBuzz";
}
else if (n % 3 == 0)
{
return "Fizz";
}
else if (n % 5 == 0)
{
return "Buzz";
}
else
{
return std::to_string(n);
}
}
  • Rename the fizz_buzz function to fizz_buzz_fibonacci to make it more descriptive.
  • Call our is_fibonacci_number helper function.
  • If the result from is_fibonacci_number is true then return Fibonacci.
  • If the result from is_fibonacci_number is false then perform the same Fizz, Buzz, FizzBuzz, or number logic returning the result.

Because we renamed fizz_buzz to fizz_buzz_fibonacci we also need to update our play_game function:

play_game.cpp
void play_game(int n, bool should_print) {
std::string result = fizz_buzz_fibonacci(n);
if (should_print) {
std::cout << result << std::endl;
}
}

Both our main function and the BENCHMARK_game function can stay exactly the same.

Benchmarking FizzBuzzFibonacci

Now we can rerun our benchmark:

$ g++ -std=c++11 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread play_game.cpp benchmark_game.cpp -o benchmark_game && ./benchmark_game
2023-10-16T15:00:00-04:00
Running ./benchmark_game
Run on (8 X 24 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x8)
Load Average: 4.34, 5.75, 4.71
---------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------
BENCHMARK_game 56190 ns 56054 ns 12280

Scrolling back through our terminal history, we can make an eyeball comparison between the performance of our FizzBuzz and FizzBuzzFibonacci games: 1698 ns vs 56190 ns. Your numbers will be a little different than mine. However, the difference between the two games is likely in the 50x range. That seems good to me! Especially for adding a feature as fancy sounding as Fibonacci to our game. The kids will love it!

Expand FizzBuzzFibonacci in C++

Our game is a hit! The kids do indeed love playing FizzBuzzFibonacci. So much so that word has come down from the execs that they want a sequel. But this is the modern world, we need Annual Recurring Revenue (ARR) not one time purchases! The new vision for our game is that it is open ended, no more living between the bounds of 1 and 100 (even if they are inclusive). No, we’re on to new frontiers!

The rules for Open World FizzBuzzFibonacci are as follows:

Write a program that takes in any positive integer and prints:

  • For multiples of three, print Fizz
  • For multiples of five, print Buzz
  • For multiples of both three and five, print FizzBuzz
  • For numbers that are part of the Fibonacci sequence, only print Fibonacci
  • For all others, print the number

In order to have our game work for any number, we will need to accept a command line argument. Update the main function to look like this:

game.cpp
#include "play_game.h"
#include <iostream>
#include <cstdlib>
int main(int argc, char *argv[])
{
if (argc > 1 && std::isdigit(argv[1][0]))
{
int i = std::atoi(argv[1]);
play_game(i, true);
}
else
{
std::cout << "Please, enter a positive integer to play..." << std::endl;
}
return 0;
}
  • Update the main function to take in argc and argv.
  • Get the first argument passed to our game and check to see if it is a digit.
    • If so, parse the first argument as an integer, i.
    • Play our game with the newly parsed integer i.
  • If parsing fails or no argument is passed in, default to prompting for a valid input.

Now we can play our game with any number! Recompile our game executable and then run the executable followed by an integer to play our game:

$ g++ -std=c++11 game.cpp play_game.cpp -o game
$ ./game 9
Fizz
$ ./game 10
Buzz
$ ./game 13
Fibonacci

And if we omit or provide an invalid number:

$ ./game
Please, enter a positive integer to play...
$ ./game bad
Please, enter a positive integer to play...

Wow, that was some thorough testing! CI passes. Our bosses are thrilled. Let’s ship it! 🚀

The End


SpongeBob SquarePants Three Weeks Later
This is Fine meme

🐰 … the end of your career maybe?


Just kidding! Everything is on fire! 🔥

Well, at first everything seemed to be going fine. And then at 02:07 AM on Saturday my pager went off:

📟 Your game is on fire! 🔥

After scrambling out of bed, I tried to figure out what was going on. I tried to search through the logs, but that was hard because everything kept crashing. Finally, I found the issue. The kids! They loved our game so much, they were playing it all the way up to a million! In a flash of brilliance, I added two new benchmarks:

benchmark_game.cpp
static void BENCHMARK_game_100(benchmark::State &state)
{
for (auto _ : state)
{
play_game(100, false);
}
}
static void BENCHMARK_game_1_000_000(benchmark::State &state)
{
for (auto _ : state)
{
play_game(1000000, false);
}
}
BENCHMARK(BENCHMARK_game_100);
BENCHMARK(BENCHMARK_game_1_000_000);
  • A micro-benchmark BENCHMARK_game_100 for playing the game with the number one hundred (100)
  • A micro-benchmark BENCHMARK_game_1_000_000 for playing the game with the number one million (1_000_000)

When I ran it, I got this:

$ g++ -std=c++11 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread play_game.cpp benchmark_game.cpp -o benchmark_game && ./benchmark_game
2023-11-04T03:00:00-04:00
Running ./benchmark_game
Run on (8 X 24 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x8)
Load Average: 4.98, 5.75, 4.96
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
BENCHMARK_game 75547 ns 59280 ns 12560
BENCHMARK_game_100 1249 ns 1243 ns 564689

Wait for it… wait for it…

BENCHMARK_game_1_000_000 110879642 ns 43628118 ns 17

What! 1,249 ns x 10,000 should be 12,490,000 ns not 110,879,642 ns 🤯 Even though I got my Fibonacci sequence code functionally correct, I must have a performance bug in there somewhere.

Fix FizzBuzzFibonacci in C++

Let’s take another look at that is_fibonacci_number function:

play_game.cpp
bool is_fibonacci_number(int n)
{
for (int i = 0; i <= n; ++i)
{
int previous = 0, current = 1;
while (current < i)
{
int next = previous + current;
previous = current;
current = next;
}
if (current == n)
{
return true;
}
}
return false;
}

Now that I’m thinking about performance, I do realize that I have an unnecessary, extra loop. We can completely get rid of the for (int i = 0; i <= n; ++i) loop and just compare the current value to the given number (n) 🤦

play_game.cpp
bool is_fibonacci_number(int n)
{
int previous = 0, current = 1;
while (current < n)
{
int next = previous + current;
previous = current;
current = next;
}
return current == n;
}
  • Update our is_fibonacci_number function.
  • Initialize our Fibonacci sequence starting with 0 and 1 as the previous and current numbers respectively.
  • Iterate while the current number is less than the given number n.
  • Add the previous and current number to get the next number.
  • Update the previous number to the current number.
  • Update the current number to the next number.
  • Once current is greater than or equal to the given number n, we will exit the loop.
  • Check to see if the current number is equal to the given number n and return that result.

Now lets rerun those benchmarks and see how we did:

$ g++ -std=c++11 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread play_game.cpp benchmark_game.cpp -o benchmark_game && ./benchmark_game
2023-11-04T05:00:00-04:00
Running ./benchmark_game
Run on (8 X 24 MHz CPU s)
CPU Caches:
L1 Data 64 KiB
L1 Instruction 128 KiB
L2 Unified 4096 KiB (x8)
Load Average: 4.69, 5.02, 4.78
-------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------
BENCHMARK_game 2914 ns 2913 ns 242382
BENCHMARK_game_100 34.4 ns 34.3 ns 20322076
BENCHMARK_game_1_000_000 61.6 ns 61.6 ns 11346874

Oh, wow! Our BENCHMARK_game benchmark is back down to around where it was for the original FizzBuzz. I wish I could remember exactly what that score was. It’s been three weeks though. My terminal history doesn’t go back that far, and Google Benchmark doesn’t store its results. But I think it’s close!

The BENCHMARK_game_100 benchmark is down nearly 50x to 34.4 ns. And the BENCHMARK_game_1_000_000 benchmark is down more than 1,500,000x! 110,879,642 ns to 61.6 ns!

🐰 Hey, at least we caught this performance bug before it made it to production… oh, right. Nevermind…

Catch Performance Regressions in CI

The execs weren’t happy about the deluge of negative reviews our game received due to my little performance bug. They told me not to let it happen again, and when I asked how, they just told me not to do it again. How am I supposed to manage that‽

Luckily, I’ve found this awesome open source tool called Bencher. There’s a super generous free tier, so I can just use Bencher Cloud for my personal projects. And at work where everything needs to be in our private cloud, I’ve started using Bencher Self-Hosted.

Bencher has a built-in adapters, so it’s easy to integrate into CI. After following the Quick Start guide, I’m able to run my benchmarks and track them with Bencher.

$ g++ -std=c++11 -isystem benchmark/include -Lbenchmark/build/src -lbenchmark -lpthread play_game.cpp benchmark_game.cpp -o benchmark_game
$ bencher run --adapter cpp_google "./benchmark_game --benchmark_format=json"
{
"context": {
"date": "2023-10-16T16:00:00-04:00",
"host_name": "bencher",
"executable": "./benchmark_game",
"num_cpus": 8,
"mhz_per_cpu": 24,
"cpu_scaling_enabled": false,
...
View results:
- BENCHMARK_game (Latency): https://bencher.dev/console/projects/game/perf?measures=52507e04-ffd9-4021-b141-7d4b9f1e9194&branches=3a27b3ce-225c-4076-af7c-75adbc34ef9a&testbeds=bc05ed88-74c1-430d-b96a-5394fdd18bb0&benchmarks=077449e5-5b45-4c00-bdfb-3a277413180d&start_time=1697224006000&end_time=1699816009000&upper_boundary=true
- BENCHMARK_game_100 (Latency): https://bencher.dev/console/projects/game/perf?measures=52507e04-ffd9-4021-b141-7d4b9f1e9194&branches=3a27b3ce-225c-4076-af7c-75adbc34ef9a&testbeds=bc05ed88-74c1-430d-b96a-5394fdd18bb0&benchmarks=96508869-4fa2-44ac-8e60-b635b83a17b7&start_time=1697224006000&end_time=1699816009000&upper_boundary=true
- BENCHMARK_game_1_000_000 (Latency): https://bencher.dev/console/projects/game/perf?measures=52507e04-ffd9-4021-b141-7d4b9f1e9194&branches=3a27b3ce-225c-4076-af7c-75adbc34ef9a&testbeds=bc05ed88-74c1-430d-b96a-5394fdd18bb0&benchmarks=ff014217-4570-42ea-8813-6ed0284500a4&start_time=1697224006000&end_time=1699816009000&upper_boundary=true

Using this nifty time travel device that a nice rabbit gave me, I was able to go back in time and replay what would have happened if we were using Bencher all along. You can see where we first pushed the buggy FizzBuzzFibonacci implementation. I immediately got failures in CI as a comment on my pull request. That same day, I fixed the performance bug, getting rid of that needless, extra loop. No fires. Just happy users.

Bencher: Continuous Benchmarking

🐰 Bencher

Bencher is a suite of continuous benchmarking tools. Have you ever had a performance regression impact your users? Bencher could have prevented that from happening. Bencher allows you to detect and prevent performance regressions before they make it to production.

  • Run: Run your benchmarks locally or in CI using your favorite benchmarking tools. The bencher CLI simply wraps your existing benchmark harness and stores its results.
  • Track: Track the results of your benchmarks over time. Monitor, query, and graph the results using the Bencher web console based on the source branch, testbed, benchmark, and measure.
  • Catch: Catch performance regressions in CI. Bencher uses state of the art, customizable analytics to detect performance regressions before they make it to production.

For the same reasons that unit tests are run in CI to prevent feature regressions, benchmarks should be run in CI with Bencher to prevent performance regressions. Performance bugs are bugs!

Start catching performance regressions in CI — try Bencher Cloud for free.



Published: Sun, November 3, 2024 at 4:30:00 PM UTC