Thresholds & Alerts
Thresholds are how you catch performance regressions with Bencher. A Threshold is assigned to a unique combination of: Branch, Testbed, and Measure. A Threshold must have a Lower Boundary, Upper Boundary, or both. Each Boundary is used to calculate a Boundary Limit. Then every new Metric is checked against each Boundary Limit.
- Lower Boundary
- A Lower Boundary is used when a smaller value would indicate a performance regression, such as with the Throughput Measure.
- Upper Boundary
- An Upper Boundary is used when a larger value would indicate a performance regression, such as with the Latency Measure.
There are a several types of Thresholds:
Alerts
Alerts are generated when a new Metric is below a Threshold’s Lower Boundary Limit or above a Threshold’s Upper Boundary Limit.
To fail a CI build in the event of an Alert set the --err
flag when using the bencher run
CLI subcommand.
See the --err
docs for more details.
Suppressing Alerts
Sometimes it can be useful to suppress Alerts for a particular Benchmark. The best way to do this is by adding one of these special suffixes to that Benchmark’s name:
_bencher_ignore
BencherIgnore
-bencher-ignore
For example, if your Benchmark was named my_flaky_benchmark
then renaming it to my_flaky_benchmark_bencher_ignore
would ignore just that particular Benchmark going forward.
Ignored Benchmarks do not get checked against the Threshold even if one exists.
However, the metrics for ignored Benchmarks are still stored.
Continuing with our example, the results from my_flaky_benchmark_bencher_ignore
would still be stored in the database under my_flaky_benchmark
.
If you remove the suffix and return to the original Benchmark name,
then things will pick right back up where you left off.
Static Thresholds
A Static Threshold (static
) is the simplest Threshold.
If a new Metric is below a set Lower Boundary or above a set Upper Boundary an Alert is generated.
That is, the Lower/Upper Boundary is an explicit Lower/Upper Boundary Limit.
Either a Lower Boundary, Upper Boundary, or both must be set.
Static Thresholds work best when the value of the Metric should stay within a constant range,
such as instruction counts.
-
Static Threshold Lower Boundary
- A Static Threshold Lower Boundary can be any floating point number. It is used when a smaller value would indicate a performance regression. The Lower Boundary must be less than or equal to the Upper Boundary, if both are specified.
- For example, if you had a Static Threshold with a Lower Boundary set to
100
the Lower Boundary Limit would likewise be100
and any value less than100
would generate an Alert.
-
Static Threshold Upper Boundary
- A Static Threshold Upper Boundary can be any floating point number. It is used when a greater value would indicate a performance regression. The Upper Boundary must be greater than or equal to the Lower Boundary, if both are specified.
- For example, if you had a Static Threshold with an Upper Boundary set to
100
the Upper Boundary Limit would likewise be100
and any value greater than100
would generate an Alert.
Statistical Thresholds
All other Thresholds are Statistical Thresholds. Each Statistical Threshold uses historical Metrics and a unique statistical significance test to determine whether an Alert is generated. Therefore, the Lower/Upper Boundary will mean different things for different Statistical Thresholds. In addition to setting a Lower Boundary and/or an Upper Boundary, there are controls for which historical Metrics are used (ie sampled).
-
Minimum Sample Size
- A minimum Sample Size can be set for a Statistical Threshold. The Statistical Threshold will only run its statistical significance test if the number of historical Metrics is greater than or equal to the minimum Sample Size. The minimum Sample Size must be less than or equal to the maximum Sample Size, if both are specified.
-
Maximum Sample Size
- A maximum Sample Size can be set for a Statistical Threshold. The Statistical Threshold will limit itself to only the most recent historical Metrics capped at the maximum Sample Size for its statistical significance test. The maximum Sample Size must be greater than or equal to the minimum Sample Size, if both are specified.
-
Window Size
- A Window Size in seconds can be set for a Statistical Threshold. The Statistical Threshold will limit itself to only the most recent historical Metrics bounded by the given time window for its statistical significance test.
Percentage Thresholds
A Percentage Threshold (percentage
) is the simplest Statistical Threshold.
If a new Metric is below a certain percentage of the mean (Lower Boundary)
or above a certain percentage of the mean (Upper Boundary) of your historical Metrics an Alert is generated.
Either a Lower Boundary, Upper Boundary, or both must be set.
Percentage Thresholds work best when the value of the Metric should stay within a known good range.
-
Percentage Threshold Lower Boundary
- A Percentage Threshold Lower Boundary can be any percentage greater than or equal to zero in decimal form (ex: use
0.10
for10%
). It is used when a smaller value would indicate a performance regression. - For example, if you had a Percentage Threshold with a Lower Boundary set to
0.10
and your historical Metrics had a mean of100
the Lower Boundary Limit would be90
and any value less than90
would generate an Alert.
- A Percentage Threshold Lower Boundary can be any percentage greater than or equal to zero in decimal form (ex: use
-
Percentage Threshold Upper Boundary
- A Percentage Threshold Upper Boundary can be any percentage greater than or equal to zero in decimal form (ex: use
0.10
for10%
). It is used when a greater value would indicate a performance regression. - For example, if you had a Percentage Threshold with an Upper Boundary set to
0.10
and your historical Metrics had a mean of100
the Upper Boundary Limit would be110
and any value greater than110
would generate an Alert.
- A Percentage Threshold Upper Boundary can be any percentage greater than or equal to zero in decimal form (ex: use
z-score Thresholds
A z-score Threshold (z-score
) measures the number of standard deviations (σ)
a new Metric is from the mean of your historical Metrics using a z-score.
z-score Thresholds work best when:
- There are no extreme differences between benchmark runs
- Benchmark runs are totally independent of one another
- The number of iterations for a single benchmark run is less than 10% of the historical Metrics
- There are at least 30 historical Metrics (minimum Sample Size >= 30)
For z-score Thresholds, standard deviations are expressed as a decimal cumulative percentage. If a new Metric is below a certain left-side cumulative percentage (Lower Boundary) or above a certain right-side cumulative percentage (Upper Boundary) for your historical Metrics an Alert is generated. Either a Lower Boundary, Upper Boundary, or both must be set.
-
z-score Threshold Lower Boundary
- A z-score Threshold Lower Boundary can be any positive decimal between
0.5
and1.0
. Where0.5
represents the mean and1.0
represents all possible left-side values (-∞). It is used when a smaller value would indicate a performance regression. - For example, if you used a z-score Threshold with a Lower Boundary of
0.977
and your historical Metrics had a mean of100
and a standard deviation of10
, the Lower Boundary Limit would be80.05
and any value less than80.05
would generate an Alert.
- A z-score Threshold Lower Boundary can be any positive decimal between
-
z-score Threshold Upper Boundary
- A z-score Threshold Upper Boundary can be any positive decimal between
0.5
and1.0
. Where0.5
represents the mean and1.0
represents all possible right-side values (∞). It is used when a greater value would indicate a performance regression. - For example, if you used a z-score Threshold with an Upper Boundary of
0.977
and your historical Metrics had a mean of100
and a standard deviation of10
, the Upper Boundary Limit would be119.95
and any value greater than119.95
would generate an Alert.
- A z-score Threshold Upper Boundary can be any positive decimal between
t-test Thresholds
A t-test Threshold (t-test
) measures the confidence interval (CI) for how likely it is that
a new Metric is above or below the mean of your historical Metrics using a Student’s t-test.
t-test Thresholds work best when:
- There are no extreme differences between benchmark runs
- Benchmark runs are totally independent of one another
- The number of iterations for a single benchmark run is less than 10% of the historical Metrics
For t-test Thresholds, confidence intervals are expressed as a decimal confidence percentage. If a new Metric is below a certain left-side confidence percentage (Lower Boundary) or above a certain right-side confidence percentage (Upper Boundary) for your historical Metrics an Alert is generated. Either a Lower Boundary, Upper Boundary, or both must be set.
-
t-test Threshold Lower Boundary
- A t-test Threshold Lower Boundary can be any positive decimal between
0.5
and1.0
. Where0.5
represents the mean and1.0
represents all possible left-side values (-∞). It is used when a smaller value would indicate a performance regression. - For example, if you used a t-test Threshold with a Lower Boundary of
0.977
and you had25
historical Metrics with a mean of100
and a standard deviation of10
, the Lower Boundary Limit would be78.96
and any value less than78.96
would generate an Alert.
- A t-test Threshold Lower Boundary can be any positive decimal between
-
t-test Threshold Upper Boundary
- A t-test Threshold Upper Boundary can be any positive decimal between
0.5
and1.0
. Where0.5
represents the mean and1.0
represents all possible right-side values (∞). It is used when a greater value would indicate a performance regression. - For example, if you used a t-test Threshold with an Upper Boundary of
0.977
and you had25
historical Metrics with a mean of100
and a standard deviation of10
, the Upper Boundary Limit would be121.04
and any value greater than121.04
would generate an Alert.
- A t-test Threshold Upper Boundary can be any positive decimal between
Log Normal Thresholds
A Log Normal Threshold (log-normal
) measures how likely it is that
a new Metric is above or below the center location of your historical Metrics using a Log Normal Distribution.
Log Normal Thresholds work best when:
- Benchmark runs are totally independent of one another
- The number of iterations for a single benchmark run is less than 10% of the historical Metrics
- All data is positive (the natural log of a negative number is
undefined
)
For Log Normal Thresholds, the likelihood expressed as a decimal percentage. If a new Metric is below a certain left-side percentage (Lower Boundary) or above a certain right-side percentage (Upper Boundary) for your historical Metrics an Alert is generated. Either a Lower Boundary, Upper Boundary, or both must be set.
-
Log Normal Threshold Lower Boundary
- A Log Normal Threshold Lower Boundary can be any positive decimal between
0.5
and1.0
. Where0.5
represents the center location and1.0
represents all possible left-side values (-∞). It is used when a smaller value would indicate a performance regression. - For example, if you used a Log Normal Threshold with a Lower Boundary of
0.977
and you had25
historical Metrics centered around100
and one pervious outlier at200
, the Lower Boundary Limit would be71.20
and any value less than71.20
would generate an Alert.
- A Log Normal Threshold Lower Boundary can be any positive decimal between
-
Log Normal Threshold Upper Boundary
- A Log Normal Threshold Upper Boundary can be any positive decimal between
0.5
and1.0
. Where0.5
represents the center location and1.0
represents all possible right-side values (∞). It is used when a greater value would indicate a performance regression. - For example, if you used a Log Normal Threshold with an Upper Boundary of
0.977
and you had25
historical Metrics centered around100
and one previous outlier at200
, the Upper Boundary Limit would be134.18
and any value greater than134.18
would generate an Alert.
- A Log Normal Threshold Upper Boundary can be any positive decimal between
Interquartile Range Thresholds
An Interquartile Range Threshold (iqr
) measures how many multiples of the interquartile range (IQR)
a new Metric is above or below the median of your historical Metrics.
If a new Metric is below a certain multiple of the IQR from the median (Lower Boundary)
or above a certain multiple of the IQR from the median (Upper Boundary) of your historical Metrics an Alert is generated.
Either a Lower Boundary, Upper Boundary, or both must be set.
-
Interquartile Range Threshold Lower Boundary
- An Interquartile Range Threshold Lower Boundary can be any multiplier greater than or equal to zero (ex: use
2.0
for2x
). It is used when a smaller value would indicate a performance regression. - For example, if you had an Interquartile Range Threshold with a Lower Boundary set to
2.0
and your historical Metrics had a median of100
and an interquartile range of10
the Lower Boundary Limit would be80
and any value less than80
would generate an Alert.
- An Interquartile Range Threshold Lower Boundary can be any multiplier greater than or equal to zero (ex: use
-
Interquartile Range Threshold Upper Boundary
- An Interquartile Range Threshold Upper Boundary can be any multiplier greater than or equal to zero (ex: use
2.0
for2x
). It is used when a greater value would indicate a performance regression. - For example, if you had an Interquartile Range Threshold with an Upper Boundary set to
2.0
and your historical Metrics had a median of100
and an interquartile range of10
the Upper Boundary Limit would be120
and any value greater than120
would generate an Alert.
- An Interquartile Range Threshold Upper Boundary can be any multiplier greater than or equal to zero (ex: use
Delta Interquartile Range Thresholds
A Delta Interquartile Range Threshold (delta-iqr
) measures how many multiples of the average percentage change (Δ) interquartile range (IQR)
a new Metric is above or below the median of your historical Metrics.
If a new Metric is below a certain multiple of the ΔIQR from the median (Lower Boundary)
or above a certain multiple of the ΔIQR from the median (Upper Boundary) of your historical Metrics an Alert is generated.
Either a Lower Boundary, Upper Boundary, or both must be set.
-
Delta Interquartile Range Threshold Lower Boundary
- A Delta Interquartile Range Threshold Lower Boundary can be any multiplier greater than or equal to zero (ex: use
2.0
for2x
). It is used when a smaller value would indicate a performance regression. - For example, if you had a Delta Interquartile Range Threshold with a Lower Boundary set to
2.0
and your historical Metrics had a median of100
, an interquartile range of10
, and an average delta interquartile range of0.2
(20%
) the Lower Boundary Limit would be60
and any value less than60
would generate an Alert.
- A Delta Interquartile Range Threshold Lower Boundary can be any multiplier greater than or equal to zero (ex: use
-
Delta Interquartile Range Threshold Upper Boundary
- A Delta Interquartile Range Threshold Upper Boundary can be any multiplier greater than or equal to zero (ex: use
2.0
for2x
). It is used when a greater value would indicate a performance regression. - For example, if you had a Delta Interquartile Range Threshold with an Upper Boundary set to
2.0
and your historical Metrics had a median of100
, an interquartile range of10
, and an average delta interquartile range of0.2
(20%
) the Upper Boundary Limit would be140
and any value greater than140
would generate an Alert.
- A Delta Interquartile Range Threshold Upper Boundary can be any multiplier greater than or equal to zero (ex: use
🐰 Congrats! You have learned all about Thresholds & Alerts! 🎉