Bencher Runner Protocol

The runner binary and the API server communicate over a single WebSocket connection. This reference describes that protocol: the messages exchanged, the Job lifecycle they drive, and how timeouts and reconnection keep Jobs from getting stuck. It is the in-depth companion to the Self-Hosted Runners guide.

You do not need to know any of this to operate a Runner with runner up. It is provided for transparency and for anyone building tooling around the API.

Connection

A Runner maintains a single WebSocket connection to the API server for its entire lifecycle. The same connection handles both Job assignment and Job execution, and it stays open across many Jobs, avoiding a reconnect and handshake for each one.

Endpoint: /v0/runners/{runner}/channel
Authentication: the Runner key is sent as an Authorization: Bearer bencher_runner_<key> header when the connection is established.
Message size: each message is bounded by the server’s request_body_max_bytes limit (applied to both the maximum message and frame size). A message that exceeds this limit, such as a completed payload carrying large stdout, stderr, or output files, is rejected at the WebSocket protocol level.

Every message is a JSON object with an event field that identifies its type.

Runner Messages

Messages sent from the Runner to the server.

Event	Description	Payload
`ready`	The Runner is idle and requesting a Job	Optional `poll_timeout` (1-900s) and `runner` metadata (`os`, `arch`, `version`, optional update `channel`, and optional binary `checksum`)
`running`	Job setup is complete and the benchmark is starting	None
`heartbeat`	Periodic liveness signal (about once per second)	None
`completed`	The benchmark completed successfully	`job` (Job UUID) and `results` (per-iteration output)
`failed`	The benchmark failed	`job` (Job UUID), `results`, and `error`
`canceled`	Acknowledges a cancellation from the server	`job` (Job UUID)

Server Messages

Messages sent from the server to the Runner.

Event	Description	Payload
`ack`	Acknowledges a received message	Optional `job` (Job UUID)
`job`	Assigns a claimed Job to the Runner	The claimed Job: its Spec, Job config, and a short-lived OCI pull token
`no_job`	The poll timeout expired with no Job available	None
`cancel`	The Job was canceled or timed out; stop execution	None
`update`	The Runner should self-update to a new version	`version`, `url` (download URL), and `checksum` (SHA-256)

The OCI pull token in a job message is generated when the Job is claimed and is never stored. It is scoped to the single project the Job belongs to, is pull only, and is short-lived, so a compromised Runner can only pull Images for the project of the Job it claimed.

On the stable update channel, an update is sent when the Runner version differs from the server version. On the canary update channel, version is canary and an update is sent when the Runner’s self-reported binary checksum differs from the published rolling canary build.

Connection Flow

After connecting, the Runner enters an idle polling loop, sending ready until the server assigns a job (or returns no_job when the poll times out, or update when a new version is available). Once it has a Job, the Runner sends running, streams heartbeat messages while the benchmark executes, and finishes with a terminal completed or failed message. The server acknowledges each message with ack, and the connection stays open so the Runner returns to the idle loop for the next Job.

If a Job is canceled, the server replies to a heartbeat with cancel. The Runner stops the benchmark and replies with canceled, which the server acknowledges.

Job Lifecycle

Each Job moves through a fixed set of states as it is claimed, executed, and processed.

From	To	Trigger
pending	claimed	A Runner claims the Job
pending	canceled	A user cancels the Job
claimed	running	The Runner sends `running`
claimed	failed	The Runner sends `failed`, or the heartbeat times out
claimed	canceled	A user cancels the Job
running	completed	The Runner sends `completed`
running	failed	The Runner sends `failed`, or the heartbeat times out
running	canceled	A user cancels the Job, or the hard Job timeout is exceeded
completed	processed	The server successfully processes the results
failed	completed	The Runner resends `completed`, overriding a heartbeat-timeout failure

processed and canceled are terminal. completed and failed are quasi-terminal: completed transitions to processed once the results are parsed, and failed transitions to completed if the Runner resends completed. Every transition uses a status filter on its database update, so a Job that was concurrently modified is re-read rather than overwritten.

Timeouts & Recovery

Three complementary mechanisms ensure a Job never gets stuck, even if a Runner crashes or loses its connection.

Heartbeat Timeout

While the connection is open, a read timeout detects a Runner that is connected but silent. Only valid protocol messages reset the timer; invalid JSON, ping/pong frames, and binary messages do not. On timeout, a Job that has run longer than its timeout plus a grace period is marked canceled, and otherwise marked failed (contact with the Runner was lost).

Hard Job Timeout

The server enforces a hard maximum execution duration independent of Runner behavior, so a buggy or compromised Runner cannot run indefinitely by sending heartbeats. When the limit (the Job timeout plus a grace period) is exceeded, the Job is marked canceled and the Runner receives a cancel message.

Disconnect Recovery

If the connection drops while a Job is still in flight, the server schedules a check after the heartbeat timeout. If the Runner has reconnected and resumed heartbeats, the Job continues; otherwise the Job is marked failed, or canceled if it had exceeded the hard timeout. On startup, the server also recovers orphaned claimed Jobs, reschedules timeouts for in-flight Jobs, and re-processes completed Jobs whose results were stored but not yet parsed.

Reconnection & Result Delivery

Reconnection is supported and idempotent. Resending running for an already-running Job only refreshes its liveness, and resending a terminal completed, failed, or canceled message is always safe. Terminal messages carry the Job UUID and receive an ack; if the connection drops before the ack arrives, the Runner stores the result and resends it on the next connection before going idle. A Runner’s actual completed result can even override a heartbeat-timeout failed status.

No Automatic Retry

A failed Job is not retried automatically. A failed benchmark is signal, not an error to hide, so re-running it is left to you.

Job Output

When a Runner sends completed or failed, the full output is stored in the same OCI storage backend used for container Images, at the path {project}/output/v0/jobs/{job}.

The stored output contains a per-iteration results array and, on failure, an error string. Each iteration records its exit_code, stdout, stderr, and a map of any collected output files to their contents. After the output is stored, the server runs the benchmark harness adapter on the results to parse Metrics and Alerts into the Report, transitioning the Job to processed.

The output is returned when a Job is queried with the GET /v0/projects/{project}/jobs/{job} API. The same request_body_max_bytes limit that bounds WebSocket messages caps the size of the output a Runner can deliver.

Published: Fri, June 19, 2026 at 8:00:00 AM UTC | Last Updated: Tue, July 7, 2026 at 8:00:00 AM UTC