From the very start, I knew that the Bencher Perf API
was going to be one of the most demanding endpoints performance wise.
I believe the main reason that so many folks have had to reinvent the benchmark tracking wheel
is that the existing off-the-shelf tools don’t handle the high dimensionality required.
By “high dimensionality”, I mean being able to track performance over time and across multiple dimensions:
Branches, Testbeds, Benchmarks, and Measures.
This ability to slice and dice across five different dimensions leads to a very complex model.
Because of this inherent complexity and the nature of the data,
I considered using a time series database for Bencher.
In the end though, I settled on using SQLite instead.
I figured it was better to do things that don’t scale
than to spend the extra time learning an entirely new database architecture that may or may not actually help.
Over time, the demands on the Bencher Perf API have also increased.
Originally, you had to select all of the dimensions that you wanted to plot manually.
This created a lot of friction for users to get to a useful plot.
To solve this, I added a list of the most recent Reports to the Perf Pages,
and by default, the most recent Report was selected and plotted.
This means that if there were 112 benchmarks in the most recent Report, then all 112 would be plotted.
The model also got even more complicated with the ability to track and visualize Threshold Boundaries.
With this in mind, I made a few performance related improvements.
Since the Perf Plot needs the most recent Report to start plotting,
I refactored the Reports API to get a Report’s result data in a single call to the database instead of iterating.
The time window for the default Report query was set to four weeks, instead of being unbounded.
I also drastically limited the scope of all database handles, reducing lock contention.
To help communicate to users, I added a status bar spinner for both the Perf Plot and the dimension tabs.
I also had a failed attempt last fall at using a composite query to get all Perf results into a single query,
instead of using a quadruple nested for loop.
This lead to me hitting the Rust type system recursion limit,
repeatedly overflowing the stack,
suffering through insane (much longer than 38 seconds) compile times,
and finally dead ending at SQLite’s max number of terms in a compound select statement.
With all of that under my belt, I knew that I really needed to dig in here
and put my performance engineer pants on.
I had never profiled a SQLite database before,
and honestly, I had never really profiled any database before.
Now wait a minute you might might be thinking.
My LinkedIn profile says I was a “Database Administrator” for almost two years.
And I never profiled a database‽
Yep. That’s a story for another time I suppose.
From ORM to SQL Query
The first hurdle I ran into was getting the SQL query out of my Rust code.
I use Diesel as the object–relational mapper (ORM) for Bencher.
Diesel creates parameterized queries.
It sends the SQL query and its bind parameters separately to the database.
That is, the substitution is done by the database.
Therefore, Diesel cannot provide a complete query to the user.
The best method that I found was using the diesel::debug_query function to output the parameterized query:
And then hand cleaning and parameterizing the query into valid SQL:
If you know of a better way, please let me know!
This is the way that the maintainer of the project suggested though,
so I just went with it.
Now that I had a SQL query, I was finally ready to… read a whole lot of documentation.
SQLite Query Planner
The SQLite website has great documentation for its Query Planner.
It explains exactly how SQLite goes about executing your SQL query,
and it teaches you which indexes are useful and what operations to look out for, like full table scans.
In order to see how the Query Planner would execute my Perf query,
I needed to add a new tool to my tool belt: EXPLAIN QUERY PLAN
You can either prefix your SQL query with EXPLAIN QUERY PLAN
or run the .eqp on dot command before your query.
Either way, I got a result that looks like this:
Oh, boy!
There is a lot here.
But the three big things that jumped out to me where:
SQLite is creating an on-the-fly view that scans the entireboundary table
SQLite is then scanning the entiremetric table
SQLite is creating two on the fly indexes
And just how big are the metric and boundary tables?
Well they just so happen to be the two largest tables,
as they are where all the Metrics and Boundaries are stored.
Since this was my first SQLite performance tuning rodeo,
I wanted to consult an expert before making any changes.
SQLite Expert
SQLite has an experimental “expert” mode that can be enabled with the .expert command.
It suggests indexes for queries, so I decided to give it a try.
This is what it suggested:
This is definitely an improvement!
It got rid of the scan on the metric table and both of the on-the-fly indexes.
Honestly, I wouldn’t have gotten the first two indexes on my own.
Thank you, SQLite Expert!
Now the only thing left to get rid of was that darn SCAN boundary.
SQLite View
When I added the ability to track and visualize Threshold Boundaries last year,
I had a decision to make in the database model.
There is a 1-to-0/1 relationship between a Metric and its corresponding Boundary.
That is a Metric can relate to zero or one Boundary, and a Boundary can only ever relate to one Metric.
So I could have just expanded the metric table to include all of the boundary data with every boundary related field being nullable.
Or I could create a separate boundary table with a UNIQUE foreign key to metric table.
To me the latter option felt a lot cleaner, and I figured I could always deal with any performance implications later.
These were the effective queries used to create the metric and boundary tables:
And it turns out “later” had arrived.
I tried to simply add an index for boundary(metric_id), but that did not help.
I believe the reason has to do with the fact that the Perf query is originating from the metric table
and because that relationship is 0/1 or put another way, nullable it has to be scanned (O(n))
and cannot be searched (O(log(n))).
This left me with one clear option.
I needed to create a view that flattened the metric and boundary relationship
to keep SQLite from having to do a full of the boundary table.
This is the query I used to create the new metric_boundary view:
With this solution, I created a pre-packaged SELECT statement.
This view only caused about a 4% increase in the size of the database,
even though it is for the two largest tables.
Best of all, it lets me have my cake and eat it too in my source code.
Creating a view with Diesel was surprisingly easy.
I just had to use the exact same macros that Diesel uses when generating my normal schema.
With that said, I learned to appreciate Diesel a lot more throughout this experience.
See Bonus Bug for all the juicy details.
Wrap Up
With the three new indexes and a view added, this is what the Query Planner now shows:
Look at all of those beautify SEARCHes all with existing indexes! 🥲
And after deploying my changes to production:
Now it was time for the final test.
How fast does that Rustls Perf page load?
Here I’ll even give you anchor tag. Click it and then refresh the page.
Bencher is a suite of continuous benchmarking tools.
Have you ever had a performance regression impact your users?
Bencher could have prevented that from happening.
Bencher allows you to detect and prevent performance regressions before they make it to production.
Run: Run your benchmarks locally or in CI using your favorite benchmarking tools. The bencher CLI simply wraps your existing benchmark harness and stores its results.
Track: Track the results of your benchmarks over time. Monitor, query, and graph the results using the Bencher web console based on the source branch, testbed, benchmark, and measure.
Catch: Catch performance regressions in CI. Bencher uses state of the art, customizable analytics to detect performance regressions before they make it to production.
For the same reasons that unit tests are run in CI to prevent feature regressions, benchmarks should be run in CI with Bencher to prevent performance regressions. Performance bugs are bugs!
I’m already dogfooding Bencher with Bencher,
but all of the existing benchmark harness adapters are for micro-benchmarking harnesses.
Most HTTP harnesses are really load testing harnesses,
and load testing is different than benchmarking.
Further, I’m not looking to expand Bencher into load testing anytime soon.
That is a very different use case that would require very different design considerations,
like that time series database for instance.
Even if I did have load testing in place,
I would really need to be running against a fresh pull of production data for this to have been caught.
The performance differences for these changes were negligible with my test database.
Click to view test database benchmark results
Before:
After indexes and materialized view:
All of this leads me to believe that I should create a micro-benchmark
that runs against the Perf API endpoint and dogfood the results with Bencher.
This will require a sizable test database
to make sure that these sort of performance regressions get caught in CI.
I have created a tracking issue for this work, if you would like to follow along.
This has all got me thinking though:
What if you could do snapshot testing of your SQL database query plan?
That is, you could compare your current vs candidate SQL database query plans.
SQL query plan testing would sort of be like instruction count based benchmarking for databases.
The query plan helps to indicate that there may be an issue with the runtime performance,
without having to actually benchmark the database query.
I have created a tracking issue for this as well.
Please, feel free to add a comment with thoughts or any prior art that you are aware of!
I was trying to be too clever,
and in my Diesel materialized view schema I had allowed this join:
I assumed that this macro was somehow smart enough
to relate the alert.boundary_id to the metric_boundary.boundary_id.
But alas, it was not.
It seems to have just picked the first column of metric_boundary (metric_id) to relate to alert.
Once I discovered the bug, it was easy to fix.
I just had to use an explicit join in the Perf query:
🐰 That’s all folks!
Published: Wed, April 17, 2024 at 10:11:00 AM UTC | Last Updated: Mon, April 22, 2024 at 11:45:00 PM UTC