Engineering Review: 2025 Edition

Everett Pompeii

Everett Pompeii


When developing a new technology, like Bencher, there is a fundamental tension between wanting to choose boring technology and beating the averages. In the moment, it can be difficult to tell exactly where one stands in this tug-of-war. Every three years, the Rust programming language comes out with a new Rust edition. I think this is a nice cadence. It’s long enough for real progress to be made. Yet, short enough not to drift too far afield. With Bencher turning 3 years old this spring, I thought it would be a great time to stop and reflect on all the engineering decisions that got us here.

In this post, I’m going to look back at where Bencher has spent its “innovation tokens” over the past three years. Bencher is an open source suite of continuous benchmarking tools. I’ll start at the frontend of the Bencher architecture and move all the way down the stack. At each stop along with way, I’ll discuss how we got here and give a binary verdict on how each engineering decision panned out.

Frontend

Frontend Library

As a recovering C++ developer, I’m a pretty big fan of Rust. If I had my druthers, I would have been able to write Bencher in full-stack Rust. Dig back into the deep recesses of the Bencher repo, you’ll see me trying to do exactly that. I experimented with Yew, Seed, and Sycamore. While they may work great for some projects, there was one major sticking point I just couldn’t get over: JavaScript Interoperation.

While JS interop is possible from WASM via Rust, it was not going to be easy. I knew I wanted Bencher to have highly interactive plots. This meant using a library like D3, which meant JS interop.

So if I was going to have to use JavaScript, which library should I choose?

Going back to those Rust crates I experimented with, Yew is the Rust analogue to React Hooks. I’d built and deployed a frontend using React Hooks in the past, so I knew the most about this framework. However, I found the React Hooks lifecycle very complicated and full of gotcha, weird edge cases.

I really liked the core principals of functional reactive programming (FRP). This lead me to trying out both Elm and its Rust analogue, Seed. Unfortunately, using Elm suffers from the same issues as using Rust. Elm requires its own JavaScript Interop. I also found The Elm Architecture a bit too restrictive for my liking.

Of all the Rust frameworks that I tried, I liked Sycamore the most. Sycamore was inspired by Solid. The more that I learned about Solid, the more I liked it. Unlike React, Solid does not use a virtual DOM. Instead, it compiles to good ole JavaScript. This makes it much faster, smaller, and easier to work with. Solid is made up of just a few powerful primitives that allow for fine-grained reactivity. When something in the UI gets updated, only the code that depends on it will rerun. Over the past three years, I have found Solid to be a pleasure to work with.

TechnologyVerdict
Yew
Seed
Sycamore
Elm
SolidJS

Frontend Framework

Solid itself is just a library. In order to build a modern frontend, I was going to need to use a full fledged web app framework. Wanting to keep things simple, I put all of my eggs in the Solid basket, and I initially used SolidStart. At the time, SolidStart only supported single-page apps (SPAs).

An SPA was fine for getting going. Eventually though, I needed to start caring about things like SEO. I was beginning to write a lot more Bencher documentation. I was also planning the Learn section of the site. This meant that I needed both client-side rendering (CSR) and static site generation (SSG). SolidStart was very young, and I couldn’t get it to meet all of my needs.

After learning about Astro and trying it out, I decided to port the entire Bencher frontend from SolidStart over to Astro. This had a couple of drawbacks. The most obvious was the effort involved. Honestly, it wasn’t too bad. Astro has its islands architecture and a first class Solid integration. I was also able to take a lot of the logic that I need from the Solid Router, and it just sort of worked.

The great compromise which is still present today is that Bencher went from a single-page app to a multi-page app. Most places you click in the console cause a full page rerender. Astro had the promise of view transitions, when I first made the switch. I tried them out, but they were buggy. I still need to circle back.

In the mean time, SolidStart seems to have caught up some. They now support both CSR and SSG. Though I haven’t checked whether they both work on the same site like I need. Water under the bridge.

TechnologyVerdict
SolidStart
Astro

Frontend Language

Astro has built-in TypeScript support. In the transition from SolidStart to Astro, I also started the transition form JavaScript to TypeScript. Bencher’s TypeScript config is set to Astro’s strictest setting. However, Astro does not perform type checking during builds. As of writing, Bencher still has 604 type errors. These type errors are used more like hints when editing code, but they don’t block the build (yet).

I also added Typeshare to sync Bencher’s Rust data types with the TypeScript frontend. This has been incredibly helpful for developing the Bencher Console. Further, all the field validators for things like usernames, emails, etc are shared between the Rust code and the TypeScript frontend via WASM. It’s been a bit of a hassle to get WASM working in both SolidStart and Astro. The largest class of error that I’ve seen in the frontend has been places where a WASM function is called but the WASM module hasn’t loaded yet. I’ve figured out how to fix it, but I’ll still sometimes forget and it crops up again.

Having both the shared types and validators auto-generated from the Rust code has made interfacing with the frontend much easier. They both get checked in CI, so they are never out of sync. All I have to do is make sure the HTTP requests are well formed, and it all just works. This makes not being able to use full-stack Rust sting a little less.

TechnologyVerdict
Rust
JavaScript
TypeScript
Typeshare
WASM

Frontend Hosting

My initial decision to go “all-in” on Solid was pretty heavily influence by Netlify hiring the creator of Solid to work on it full-time. You see, Netlify’s biggest competitor is Vercel. Vercel created and maintains Next.js. And I figured Netlify wanted Solid to be their Next.js. Therefore, I thought there would be no better place to host a SolidStart site than Netlify.

By default, Netlify tries to get you to use their build system. Using Netlify’s build system makes it very hard to do atomic deploys. Netlify would still publish the frontend even if the backend pipeline failed. Very bad! This lead me to move over to building the frontend in the same CI/CD environment as the backend and then just uploading the latest version to Netlify with their CLI. When I made the transition from SolidStart to Astro, I was able to keep the same CI/CD setup. Astro has a first-party Netlify integration.

Bencher managed to stay under the Netlify free tier for quite some time. With Bencher’s increasing popularity though, we have started to exceed some of the free tier limits. I have considered moving the Astro site over to sst on AWS. However, the cost savings haven’t seemed worth the effort at this point.

TechnologyVerdict
Netlify Builds
Netlify Deploys

Backend

Backend Language

Rust.

TechnologyVerdict
Rust

HTTP Server Framework

One of my top considerations when selecting a Rust HTTP server framework was built-in OpenAPI spec support. For the same reasons that I invested in setting up Typeshare and WASM on the frontend, I wanted the ability to auto-generate both API docs and clients from that spec. It was important to me that this functionality was built-in and not a third-party add-on. For the automation to actually be worth it, it has to work pretty close to 100% of the time. This means the maintenance and compatibility burden needs to be on the core framework engineers themselves. Otherwise, you will inevitably find yourself in edge case hell.

Another key consideration was the risk of abandonment. There are several once promising Rust HTTP frameworks that now sit all but abandoned. The only framework that I found that had built-in OpenAPI spec support that I was willing to bet on was Dropshot. Dropshot was created and is still maintained by Oxide Computer.

I’ve only had one major issue with Dropshot thus far. When an error is generated by the API server, it causes a CORS failure on the frontend due to missing response headers. This means that the web frontend can’t display very helpful error messages to users. Instead of working on upstreaming a fix, I put my efforts towards making Bencher easier and more intuitive to use. But it turns out the solution was less than 100 lines of code. Jokes on me!

As an aside, the axum framework had not yet been released when I started working on Bencher. If it had been around at the time, I may have tried to pair it with one of the many 3rd party OpenAPI add-ons, despite my better judgment. Lucky for me, axum wasn’t yet there to tempt me. Dropshot has been a great choice. See the API Client section for more on this point.

TechnologyVerdict
Dropshot

Database

I’ve tried to keep Bencher as simple as possible. The first version of Bencher took everything, including the benchmarks results themselves through URL query params. I quickly learned that all browsers have a limit on URL length. Makes sense.

Next, I considered storing the benchmark results in git and just generating a static HTML file with the plots and results. However, this approach has two major draw backs. First, the git clone times would eventually become untenable for heavy users. Second, all historical data would have to be present in the HTML file, leading to very long initial load times for heavy users. A dev tool should love their heavy users, not punish them.

It turns out there’s a solution to my problem. It’s called a database.

So why not just pull in Postgres and call it a day? Well, I really wanted folks to be able to self-host Bencher. The simpler I could make the architecture, the easier (and cheaper) it would be for others to self-host. I was already going to require two containers due to the separate frontend and backend. Could I avoid a third? Yep!

Before Bencher, I had only used SQLite as a test database. The developer experience was fantastic, but I never considered running it in production. Then I came across Litestream. Litestream is a disaster recovery tool for SQLite. It runs in the background and continuously replicates changes to S3 or any other datastore of your choosing. This makes it both easy to use and incredibly cost efficient to run, since S3 doesn’t charge for writes. Think pennies per day for a small instance.

When I first ran across Litestream, there was also the promise of live read replicas coming soon. However, this never came to fruition. The suggested alternative was a successor project by the same developer called LiteFS. There are major downsides to LiteFS though. It does not offer built-in disaster recovery, if all replicas go down. In order to have multiple replicas, you have to infect your application logic with the concept of whether they are a reader or a writer. And the absolute blocker was that it requires a Consul instance to be running at all times to manage the replicas. The entire point of using SQLite was to avoid yet another service. Thankfully, I didn’t try to use LiteFS with Bencher Cloud either, as LiteFS Cloud was sunset a year after launch, and LiteFS is now all but dead.

Currently, the small downtime between deploys is handled by the Bencher CLI. In the future, I plan to move to zero-downtime deploys using Kamal. With Rails 8.0 defaulting to Kamal and SQLite, I feel fairly confident that Kamal and Litestream should pair well together.

TechnologyVerdict
URL Query Params
git + HTML
SQLite
Litestream
LiteFS

Database Driver

The closer to the database I get, the more strongly typed I want things. It’s okay to play a little fast and loose on the frontend. If I mess up, everything will be right as rain with the next push to production. If I corrupt the database though, it’s much more of an ordeal to set right. With that in mind, I chose to use Diesel.

Diesel is a strongly typed object relational mapper (ORM) and query builder for Rust. It checks all database interactions at compile time, preventing runtime errors. This compile time checking also makes Diesel a zero-cost abstraction over SQL. Other than a small bug on my end when making things 1200x faster with performance tuning, I have had no runtime SQL errors when working with Diesel.

🐰 Fun Fact: Diesel uses Bencher for continuous benchmarking!

TechnologyVerdict
Diesel

Backend Hosting

In the same way that I chose Netlify for my frontend hosting because I was using Solid, I chose Fly.io for my backend hosting because I was using Litestream. Fly.io had just hired the creator of Litestream to work on it full-time. As mentioned above, this work on Litestream was eventually cannibalized by LiteFS, and LiteFS is now dead. So that didn’t really pan out as I had hoped.

In the future when I switch to Kamal, I’ll also be moving off of Fly.io. Fly.io has had a couple of major outages which took Bencher down for half-a-day each time. But the biggest issue is the impedance mismatch that comes from using Litestream.

Every time I log into the Fly.io dashboard, I see this warning:

ℹ Your app is running on a single machine

Scale and run your app on more Machines to ensure high availability with one command:

fly scale count 2

Check out the documentation for more details on scaling.

But with Litestream, you still can’t have more than one machine! You all never delivered read replication, like you promised!

So yeah, that’s all a little ironic and frustrating. At one point, I looked into libSQL and Turso. However, libSQL requires a special backend server for replication which makes it not work with Diesel. Either way, it looks like I dodged another end-of-life shutdown there as well. I am very interested to see what Turso does with Limbo, their Rust rewrite of SQLite. But I won’t be making that switch anytime soon. The next stop is a nice, boring, and stable VM running Kamal.

The AWS S3 backend for the Litestream replication has worked flawlessly. Even with the rug pull around Litestream and Fly.io, I still think I made the right call using Litestream with Bencher. I’m starting to hit some scaling issues with Bencher Cloud, but this as a good problem to have.

TechnologyVerdict
Fly.io
AWS S3

CLI

CLI Library

When building a Rust CLI, Clap is a bit of a de facto standard. So imagine my shock when I first publicly demoed Bencher and the creator himself, Ed Page, was there! 🤩

Over time, I keep finding more and more helpful things that Clap can do. It’s a bit embarrassing, but I just recently discovered the default_value option. All of these capabilities really help to shrink the amount of code I have to maintain in the bencher CLI.

🐰 Fun Fact: Clap uses Bencher to track binary size!

TechnologyVerdict
Clap

API Client

A major factor in picking Dropshot as Bencher’s HTTP server framework was its built-in ability to generate an OpenAPI spec. I was hopeful that one day I could auto-generate an API client from that spec. A year or so later, the creators of Dropshot delivered: Progenitor.

Progenitor is the yin to Dropshot’s yang. Using the OpenAPI spec from Dropshot, Progenitor can generate a Rust API client in either a positional pattern:

client.instance_create("bencher", "api", None)

or a builder pattern:

client.instance_create().organization("bencher").project("api").send()

Personally, I prefer the latter, so that’s what Bencher uses. Progenitor can also generate an entire Clap CLI to interact with the API. However, I haven’t used it. I needed to have tighter control over things, especially for commands like bencher run.

The only notable downside I’ve found with the generated types is that due to limitations in JSON Schema, you can’t just use an Option<Option<Item>> when you need to be able to disambiguate between a missing item key and an item key with the value set to null. This is possible with something like double_option, but it all looks the same at the level of the JSON Schema. Using a flattened or untagged inner struct enum doesn’t play nice with Dropshot. The only solution that I found was to use a top-level, untagged enum. There are only two such fields in the entire API at this point though, so not a huge deal.

TechnologyVerdict
Progenitor

Development

Developer Environment

As I started working on Bencher, folks were calling for the end of localhost. I was already well past the need for a new development laptop, so I decided to try out a cloud development environment. At the time GitHub Workspaces wasn’t generally available (GA) for my use case, so I went with Gitpod.

This experiment lasted for about six months. My conclusion: cloud development environments do not work well for side projects. Do you want to hop on and knock out five minutes of work? Nope! You’re going to sit there and wait for your dev environment to reinitialize itself for the 1,000th time. Oh, you have an entire afternoon on the weekend to really crank out some work? Nope! Your dev environment is just going to randomly stop working while you are using it. Again and again and again.

I ran into these issues as a paid user. At $25/month, I could get a brand new M1 MacBook Pro with much better specs every five years. When Gitpod announced they were changing their pricing from a flat rate to be usage based, I just let them cancel my plan and headed on over to apple.com.

Maybe this was all an issue with Gitpod’s now abandoned decision to use Kubernetes. But I’m in no hurry to try out another cloud development environment with Bencher again. I did eventually port the Gitpod configuration over to a dev container to make it easier for contributors to get started. For me though, I’m sticking with localhost.

TechnologyVerdict
Gitpod
M1 MacBook Pro

Continuous Integration

Bencher is open source. As a modern open source project, you sort of have to be on GitHub. This means the path of least resistance for continuous integration (CI) is GitHub Actions. Over the years, I’ve began to loathe the YAML based CI DSLs. They each have their own quarks, and when it comes to a company as big as GitHub, getting a ⚠️ icon instead of an ❌ icon can languish for years.

This motivated me to try out Dagger. At the time, you could only use Dagger via this esoteric language called CUE. I tried. I really did. For like a whole weekend. Maybe if ChatGPT had existed back then, I could have made it through. But it wasn’t just me. Dagger eventually abandoned CUE altogether for more sane SDKs. By that time though, it was too late for me.

Defeated by Dagger, I accepted my YAML CI DSL fate, and Bencher now uses GitHub Actions. Heck, I even built a Bencher CLI GitHub Action. Be the change problem you wish to see in the world.

TechnologyVerdict
Dagger
GitHub Actions⚠️

Conclusion

Building Bencher has taught me a lot about the trade-offs that come with each engineering decision. There are some choices that I would make differently now, but that’s a good thing. It means that I’ve learned a thing or two along the way. Overall, I’m very happy with where Bencher is today. Bencher has gone form a sketch in my notebook to a full-fledged product with a growing user base, a vibrant community, and paying customers. I’m excited to see where the next three years take us!

StackComponentTechnologyVerdict
FrontendFrontend LibraryYew
Seed
Sycamore
Elm
SolidJS
Frontend LanguageRust
JavaScript
TypeScript
Typeshare
WASM
Frontend HostingNetlify Builds
Netlify Deploys
BackendBackend LanguageRust
HTTP Server FrameworkDropshot
DatabaseURL Query Params
git + HTML
SQLite
Litestream
LiteFS
Database DriverDiesel
Backend HostingFly.io
AWS S3
CLICLI LibraryClap
API ClientProgenitor
DevelopmentDeveloper EnvironmentGitpod
M1 MacBook Pro
Continuous IntegrationDagger
GitHub Actions⚠️

Bencher: Continuous Benchmarking

🐰 Bencher

Bencher is a suite of continuous benchmarking tools. Have you ever had a performance regression impact your users? Bencher could have prevented that from happening. Bencher allows you to detect and prevent performance regressions before they make it to production.

  • Run: Run your benchmarks locally or in CI using your favorite benchmarking tools. The bencher CLI simply wraps your existing benchmark harness and stores its results.
  • Track: Track the results of your benchmarks over time. Monitor, query, and graph the results using the Bencher web console based on the source branch, testbed, benchmark, and measure.
  • Catch: Catch performance regressions in CI. Bencher uses state of the art, customizable analytics to detect performance regressions before they make it to production.

For the same reasons that unit tests are run in CI to prevent feature regressions, benchmarks should be run in CI with Bencher to prevent performance regressions. Performance bugs are bugs!

Start catching performance regressions in CI — try Bencher Cloud for free.



Published: Sun, February 2, 2025 at 4:30:00 PM UTC