Bencher Runner Protocol
The runner binary and the API server communicate over a single WebSocket
connection. This reference describes that protocol: the messages exchanged, the Job lifecycle
they drive, and how timeouts and reconnection keep Jobs from getting stuck. It is the in-depth
companion to the Self-Hosted Runners guide.
You do not need to know any of this to operate a Runner with runner up.
It is provided for transparency and for anyone building tooling around the API.
Connection
A Runner maintains a single WebSocket connection to the API server for its entire lifecycle. The same connection handles both Job assignment and Job execution, and it stays open across many Jobs, avoiding a reconnect and handshake for each one.
- Endpoint:
/v0/runners/{runner}/channel - Authentication: the Runner key is sent as an
Authorization: Bearer bencher_runner_<key>header when the connection is established. - Message size: each message is bounded by the server’s
request_body_max_byteslimit (applied to both the maximum message and frame size). A message that exceeds this limit, such as acompletedpayload carrying largestdout,stderr, or output files, is rejected at the WebSocket protocol level.
Every message is a JSON object with an event field that identifies its type.
Runner Messages
Messages sent from the Runner to the server.
| Event | Description | Payload |
|---|---|---|
ready | The Runner is idle and requesting a Job | Optional poll_timeout (1-900s) and runner metadata (os, arch, version) |
running | Job setup is complete and the benchmark is starting | None |
heartbeat | Periodic liveness signal (about once per second) | None |
completed | The benchmark completed successfully | job (Job UUID) and results (per-iteration output) |
failed | The benchmark failed | job (Job UUID), results, and error |
canceled | Acknowledges a cancellation from the server | job (Job UUID) |
Server Messages
Messages sent from the server to the Runner.
| Event | Description | Payload |
|---|---|---|
ack | Acknowledges a received message | Optional job (Job UUID) |
job | Assigns a claimed Job to the Runner | The claimed Job: its Spec, Job config, and a short-lived OCI pull token |
no_job | The poll timeout expired with no Job available | None |
cancel | The Job was canceled or timed out; stop execution | None |
update | The Runner should self-update to a new version | version, url (download URL), and checksum (SHA-256) |
The OCI pull token in a job message is generated when the Job is claimed and is never stored.
It is scoped to the single project the Job belongs to, is pull only, and is short-lived,
so a compromised Runner can only pull Images for the project of the Job it claimed.
Connection Flow
After connecting, the Runner enters an idle polling loop,
sending ready until the server assigns a job
(or returns no_job when the poll times out, or update when a new version is available).
Once it has a Job, the Runner sends running,
streams heartbeat messages while the benchmark executes,
and finishes with a terminal completed or failed message.
The server acknowledges each message with ack,
and the connection stays open so the Runner returns to the idle loop for the next Job.
If a Job is canceled, the server replies to a heartbeat with cancel.
The Runner stops the benchmark and replies with canceled, which the server acknowledges.
Job Lifecycle
Each Job moves through a fixed set of states as it is claimed, executed, and processed.
| From | To | Trigger |
|---|---|---|
| pending | claimed | A Runner claims the Job |
| pending | canceled | A user cancels the Job |
| claimed | running | The Runner sends running |
| claimed | failed | The Runner sends failed, or the heartbeat times out |
| claimed | canceled | A user cancels the Job |
| running | completed | The Runner sends completed |
| running | failed | The Runner sends failed, or the heartbeat times out |
| running | canceled | A user cancels the Job, or the hard Job timeout is exceeded |
| completed | processed | The server successfully processes the results |
| failed | completed | The Runner resends completed, overriding a heartbeat-timeout failure |
processed and canceled are terminal.
completed and failed are quasi-terminal:
completed transitions to processed once the results are parsed,
and failed transitions to completed if the Runner resends completed.
Every transition uses a status filter on its database update,
so a Job that was concurrently modified is re-read rather than overwritten.
Timeouts & Recovery
Three complementary mechanisms ensure a Job never gets stuck, even if a Runner crashes or loses its connection.
Heartbeat Timeout
While the connection is open, a read timeout detects a Runner that is connected but silent.
Only valid protocol messages reset the timer;
invalid JSON, ping/pong frames, and binary messages do not.
On timeout, a Job that has run longer than its timeout plus a grace period is marked canceled,
and otherwise marked failed (contact with the Runner was lost).
Hard Job Timeout
The server enforces a hard maximum execution duration independent of Runner behavior,
so a buggy or compromised Runner cannot run indefinitely by sending heartbeats.
When the limit (the Job timeout plus a grace period) is exceeded,
the Job is marked canceled and the Runner receives a cancel message.
Disconnect Recovery
If the connection drops while a Job is still in flight,
the server schedules a check after the heartbeat timeout.
If the Runner has reconnected and resumed heartbeats, the Job continues;
otherwise the Job is marked failed, or canceled if it had exceeded the hard timeout.
On startup, the server also recovers orphaned claimed Jobs,
reschedules timeouts for in-flight Jobs,
and re-processes completed Jobs whose results were stored but not yet parsed.
Reconnection & Result Delivery
Reconnection is supported and idempotent.
Resending running for an already-running Job only refreshes its liveness,
and resending a terminal completed, failed, or canceled message is always safe.
Terminal messages carry the Job UUID and receive an ack;
if the connection drops before the ack arrives,
the Runner stores the result and resends it on the next connection before going idle.
A Runner’s actual completed result can even override a heartbeat-timeout failed status.
No Automatic Retry
A failed Job is not retried automatically.
A failed benchmark is signal, not an error to hide,
so re-running it is left to you.
Job Output
When a Runner sends completed or failed,
the full output is stored in the same OCI storage backend used for container Images,
at the path {project}/output/v0/jobs/{job}.
The stored output contains a per-iteration results array and, on failure, an error string.
Each iteration records its exit_code, stdout, stderr,
and a map of any collected output files to their contents.
After the output is stored, the server runs the benchmark harness adapter on the results
to parse Metrics and Alerts into the Report, transitioning the Job to processed.
The output is returned when a Job is queried with the GET /v0/projects/{project}/jobs/{job} API.
The same request_body_max_bytes limit that bounds WebSocket messages
caps the size of the output a Runner can deliver.