Diesel: Continuous Benchmarking Case Study

Everett Pompeii

What is Diesel?

Diesel is a safe, extensible object relational mapper (ORM) and query builder for Rust. An ORM is used to translate between a relational database and a programming language. Diesel is a strongly typed ORM. It checks all database interactions at compile time, preventing runtime errors. This compile time checking also makes Diesel a zero-cost abstraction over SQL. By eliminating all the boilerplate, Diesel lets you focus on what really matters in your Rust code.

Diesel supports the three most popular open source databases:

PostgreSQL
MySQL
SQLite

There is also a third-party crate that extends Diesel to add support for Oracle.

🐰 Fun Fact: Bencher uses Diesel to manage our SQLite database!

Benchmarking Diesel

Diesel’s first commit was made by the project’s creator Sage Griffin on 23 August 2015. This was just three months after Rust 1.0 was released, so they originally used the built-in libtest bench benchmarking harness. Then on 03 November 2020, they switched over to using Criterion as their benchmarking harness.

When Diesel switched over to using Criterion, they also started to benchmark themselves against other Rust ORMs. This benchmark comparison suite now includes:

It wasn’t until 05 May 2021 that contributor and now lead maintainer Georg Semmler started tracking the Diesel benchmark results over time. He created a nightly GitHub Action to run the benchmarks and a separate metrics repo to track the benchmark results. Due to the noise of the GitHub Action runners, these results are only relied upon to show a general trend for large changes. There’s also no monitoring setup, so the Diesel maintainers have to manually check the results to spot a performance regression.

In order to get an even better understanding of the performance impact of a change, on 20 March 2022 instruction count based benchmarking was added using criterion-perf-events. Due to criterion-perf-events using Linux perf events, it cannot be run in GitHub Actions. Therefore, these benchmarks have to be run locally.

Continuous Benchmarking for Diesel

Before Diesel started to track their benchmarks, Georg Semmler set up Relative Continuous Benchmarking for the Diesel project. Between 02 November 2020 and 29 January 2021, he add a GitHub Actions workflow that was activated via a run-benchmarks label on a pull request. Once the label was added, GitHub Actions would run the benchmarks on both the current master and the PR branch and then compare the results using critcmp. Due to security concerns around pwn requests, he had not yet found a way to safely post the results to the PR itself. This meant that the benchmark results had to be manually inspected to detect a performance regression, and the results would be deleted in 90 days.

After finding out about Bencher, he wanted to take advantage of Bencher’s advanced statistical thresholds and alerts and Bencher’s ability to safely comment on pull requests. Bencher was also flexible enough to allow Diesel to still use it’s run-benchmarks tag and keep critcmp as a fallback. With these changes merge into Diesel on 23 February 2024, they are now able to more easily compare their Relative Continuous Benchmarking results and catch performance regressions in pull requests.

Trophy Case

PR #3180: The benchmarks caught a >13% performance regression in boxed query performance.
PR #2774 | PR #2788 | PR #3098: The benchmarks showed that the proposed optimizations were not necessary. That is, they did not result in the expected performance improvements, so they were not merged.
PR #2799: The benchmarks caught a slight performance regressions in the SQLite and MySQL backends. These regressions were deemed acceptable by the maintainers given the ergonomic improvements afforded by the changes.
PR #2827: The benchmarks showed that using prepared statements for inserts improved performance by >10% and up to 2x-3x for SQLite.
PR #2931: The benchmarks showed that rewriting the bind serialization layer for SQLite created a >35% performance improvement on inserts.
PR #3109: The instruction count based benchmarks show that optimizing the SQLite statement iterator lead to a 1-6% improvement in query performance.
PR #3110: The instruction count based benchmarks showed that inlining building from a SQL row lead to a 5% improvement.
PR #3944: This was the first pull request to use the new Bencher integration! The benchmarks still had to be run locally in order to detect a performance regression. So let’s take this chance to celebrate a false negative and tune those thresholds!

Wrap Up

The Diesel project and Georg Semmler in particular have put a lot of time and effort into making sure that Diesel stays fast. They have a comprehensive benchmarking suite that can be run both locally and in CI for wall clock based benchmarks. The instruction count based benchmarks must be run locally, which requires a little more effort by the maintainers to use. They also track their benchmark results over time in order to spot any unforeseen performance changes.

In addition to benchmarking themselves, they have also created and maintained a comparative benchmarking suite. This allows Diesel to compare themselves against the other eight most popular ORMs in the Rust ecosystem. They use this comparison as a chance to learn from other ORMs on where they can improve.

Diesel has woven together a patchwork of benchmarking solutions to catch performance regressions before they get released. If your project does not have the time and resources to build and maintain a bespoke continuous benchmarking solution, like the Rustls project then you may want to take a page from the Diesel project’s playbook.

A very special thank you to Georg Semmler for reviewing this case study.

Bencher: Continuous Benchmarking

The Diesel project uses Bencher to catch performance regressions in CI for performance sensitive pull requests.

Bencher is a suite of continuous benchmarking tools. Have you ever had a performance regression impact your users? Bencher could have prevented that from happening. Bencher allows you to detect and prevent performance regressions before they make it to production.

Run: Run your benchmarks locally or in CI using your favorite benchmarking tools. The bencher CLI simply wraps your existing benchmark harness and stores its results.
Track: Track the results of your benchmarks over time. Monitor, query, and graph the results using the Bencher web console based on the source branch, testbed, benchmark, and measure.
Catch: Catch performance regressions in CI. Bencher uses state of the art, customizable analytics to detect performance regressions before they make it to production.

For the same reasons that unit tests are run in CI to prevent feature regressions, benchmarks should be run in CI with Bencher to prevent performance regressions. Performance bugs are bugs!

Start catching performance regressions in CI — try Bencher Cloud for free.

Published: Thu, May 23, 2024 at 7:39:00 AM UTC