Bencher Runner Protocol


The runner binary and the API server communicate over a single WebSocket connection. This reference describes that protocol: the messages exchanged, the Job lifecycle they drive, and how timeouts and reconnection keep Jobs from getting stuck. It is the in-depth companion to the Self-Hosted Runners guide.

You do not need to know any of this to operate a Runner with runner up. It is provided for transparency and for anyone building tooling around the API.


Connection

A Runner maintains a single WebSocket connection to the API server for its entire lifecycle. The same connection handles both Job assignment and Job execution, and it stays open across many Jobs, avoiding a reconnect and handshake for each one.

  • Endpoint: /v0/runners/{runner}/channel
  • Authentication: the Runner key is sent as an Authorization: Bearer bencher_runner_<key> header when the connection is established.
  • Message size: each message is bounded by the server’s request_body_max_bytes limit (applied to both the maximum message and frame size). A message that exceeds this limit, such as a completed payload carrying large stdout, stderr, or output files, is rejected at the WebSocket protocol level.

Every message is a JSON object with an event field that identifies its type.


Runner Messages

Messages sent from the Runner to the server.

EventDescriptionPayload
readyThe Runner is idle and requesting a JobOptional poll_timeout (1-900s) and runner metadata (os, arch, version)
runningJob setup is complete and the benchmark is startingNone
heartbeatPeriodic liveness signal (about once per second)None
completedThe benchmark completed successfullyjob (Job UUID) and results (per-iteration output)
failedThe benchmark failedjob (Job UUID), results, and error
canceledAcknowledges a cancellation from the serverjob (Job UUID)

Server Messages

Messages sent from the server to the Runner.

EventDescriptionPayload
ackAcknowledges a received messageOptional job (Job UUID)
jobAssigns a claimed Job to the RunnerThe claimed Job: its Spec, Job config, and a short-lived OCI pull token
no_jobThe poll timeout expired with no Job availableNone
cancelThe Job was canceled or timed out; stop executionNone
updateThe Runner should self-update to a new versionversion, url (download URL), and checksum (SHA-256)

The OCI pull token in a job message is generated when the Job is claimed and is never stored. It is scoped to the single project the Job belongs to, is pull only, and is short-lived, so a compromised Runner can only pull Images for the project of the Job it claimed.


Connection Flow

After connecting, the Runner enters an idle polling loop, sending ready until the server assigns a job (or returns no_job when the poll times out, or update when a new version is available). Once it has a Job, the Runner sends running, streams heartbeat messages while the benchmark executes, and finishes with a terminal completed or failed message. The server acknowledges each message with ack, and the connection stays open so the Runner returns to the idle loop for the next Job.

If a Job is canceled, the server replies to a heartbeat with cancel. The Runner stops the benchmark and replies with canceled, which the server acknowledges.

API ServerRunnerAPI ServerRunneralt[Job available][Poll timeout][Update available]loop[Idle / polling]loop[Benchmark executes]Connect with runner keyConnectedready (os, arch, version)job (Spec, config, OCI token)no_jobupdate (version, url, checksum)runningackheartbeatack (or cancel)completed (job, results)ack

Job Lifecycle

Each Job moves through a fixed set of states as it is claimed, executed, and processed.

FromToTrigger
pendingclaimedA Runner claims the Job
pendingcanceledA user cancels the Job
claimedrunningThe Runner sends running
claimedfailedThe Runner sends failed, or the heartbeat times out
claimedcanceledA user cancels the Job
runningcompletedThe Runner sends completed
runningfailedThe Runner sends failed, or the heartbeat times out
runningcanceledA user cancels the Job, or the hard Job timeout is exceeded
completedprocessedThe server successfully processes the results
failedcompletedThe Runner resends completed, overriding a heartbeat-timeout failure

processed and canceled are terminal. completed and failed are quasi-terminal: completed transitions to processed once the results are parsed, and failed transitions to completed if the Runner resends completed. Every transition uses a status filter on its database update, so a Job that was concurrently modified is re-read rather than overwritten.

runner claims

user cancels

running

failed / timeout

user cancels

completed

failed / timeout

cancel / hard timeout

results recovered

results parsed

pending

claimed

canceled

running

failed

completed

processed


Timeouts & Recovery

Three complementary mechanisms ensure a Job never gets stuck, even if a Runner crashes or loses its connection.

Heartbeat Timeout

While the connection is open, a read timeout detects a Runner that is connected but silent. Only valid protocol messages reset the timer; invalid JSON, ping/pong frames, and binary messages do not. On timeout, a Job that has run longer than its timeout plus a grace period is marked canceled, and otherwise marked failed (contact with the Runner was lost).

Hard Job Timeout

The server enforces a hard maximum execution duration independent of Runner behavior, so a buggy or compromised Runner cannot run indefinitely by sending heartbeats. When the limit (the Job timeout plus a grace period) is exceeded, the Job is marked canceled and the Runner receives a cancel message.

Disconnect Recovery

If the connection drops while a Job is still in flight, the server schedules a check after the heartbeat timeout. If the Runner has reconnected and resumed heartbeats, the Job continues; otherwise the Job is marked failed, or canceled if it had exceeded the hard timeout. On startup, the server also recovers orphaned claimed Jobs, reschedules timeouts for in-flight Jobs, and re-processes completed Jobs whose results were stored but not yet parsed.

Reconnection & Result Delivery

Reconnection is supported and idempotent. Resending running for an already-running Job only refreshes its liveness, and resending a terminal completed, failed, or canceled message is always safe. Terminal messages carry the Job UUID and receive an ack; if the connection drops before the ack arrives, the Runner stores the result and resends it on the next connection before going idle. A Runner’s actual completed result can even override a heartbeat-timeout failed status.

No Automatic Retry

A failed Job is not retried automatically. A failed benchmark is signal, not an error to hide, so re-running it is left to you.


Job Output

When a Runner sends completed or failed, the full output is stored in the same OCI storage backend used for container Images, at the path {project}/output/v0/jobs/{job}.

The stored output contains a per-iteration results array and, on failure, an error string. Each iteration records its exit_code, stdout, stderr, and a map of any collected output files to their contents. After the output is stored, the server runs the benchmark harness adapter on the results to parse Metrics and Alerts into the Report, transitioning the Job to processed.

The output is returned when a Job is queried with the GET /v0/projects/{project}/jobs/{job} API. The same request_body_max_bytes limit that bounds WebSocket messages caps the size of the output a Runner can deliver.



Published: Fri, June 19, 2026 at 8:00:00 AM UTC