CI Reporting¶
conbench ci report turns submitted benchmark results into a synchronous CI
diagnostic. It prints JSON or Markdown, exits with a CI-friendly status, and can
link to the dashboard report page.
Metadata Contract¶
The submitted payloads must match the report selector:
github.commitmust equal the SHA passed toconbench ci report --commit.github.repositorymust normalize to the repository passed to--repository.
If benchmark payloads stamp a merge commit while the report uses a head commit,
or omit commit metadata entirely, the report will return action_required
because it cannot find the current run.
Set CONBENCH_TOKEN in the CI environment. The examples below rely on
CONBENCH_TOKEN instead of --token so the token is not placed on process
arguments; --token remains available for explicit local overrides.
GitHub Actions Fragment¶
- name: Run benchmarks
run: ./scripts/run-benchmarks --output bench-results
- name: Collect current-attempt run IDs
id: run_ids
run: |
RUN_IDS="$(jq -r 'if type == "array" then .[]?.run_id else .run_id end // empty' bench-results/*.json | sort -u | paste -sd, -)"
echo "run_ids=$RUN_IDS" >> "$GITHUB_OUTPUT"
- name: Submit benchmark results
run: |
conbench results submit "bench-results/*.json" \
--server "$CONBENCH_SERVER_URL" | tee conbench-submit.jsonl
- name: Render Conbench CI report
run: |
RUN_IDS="${{ steps.run_ids.outputs.run_ids }}"
set +e
conbench ci report \
--server "$CONBENCH_SERVER_URL" \
--repository "${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}" \
--commit "$GITHUB_SHA" \
${RUN_IDS:+--run-ids "$RUN_IDS"} \
--format markdown \
--output conbench-report.md
report_status=$?
set -e
if [ -f conbench-report.md ]; then
cat conbench-report.md >> "$GITHUB_STEP_SUMMARY"
fi
exit "$report_status"
The shape-tolerant jq expression handles either a single result file or files
containing arrays during local experimentation. The submit step still expects
one Conbench result object per file, so split array-shaped output before calling
conbench results submit.
Selector Modes¶
Use commit-wide mode when you want the simplest report:
conbench ci report \
--server "$CONBENCH_SERVER_URL" \
--repository "${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}" \
--commit "$GITHUB_SHA"
Use current-attempt mode when the workflow emits a shared run_id and you only
want this CI attempt:
conbench ci report \
--server "$CONBENCH_SERVER_URL" \
--repository "${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}" \
--commit "$GITHUB_SHA" \
--run-ids "$RUN_IDS"
Use explicit run-comparison mode when you already know the contender and baseline run IDs:
conbench ci report \
--server "$CONBENCH_SERVER_URL" \
--run-ids "contender-run" \
--baseline-run-ids "baseline-run" \
--format markdown
--baseline-run-ids is paired by position with --run-ids, so both lists must
have the same count. Do not combine it with --baseline; automatic baseline
selection and explicit run comparison are separate modes.
Baseline Modes¶
Automatic baseline selection is controlled by --baseline:
| Mode | Meaning |
|---|---|
fork_point |
Compare against the merge base or fork point for the selected commit. This is the default PR-oriented mode. |
parent |
Compare against the selected commit's parent. Use this for step-by-step commit diagnostics. |
latest_default |
Compare against the latest known default-branch run before the selected commit. This is useful when parent or fork-point runs are unavailable. |
When a baseline run is not found, the report includes a typed baseline error
and the commits searched. The default ancestry search is bounded; if it is
exhausted, narrow the report with --run-ids or provide explicit
--baseline-run-ids.
Report Status¶
The report status is chosen from the whole report, not from one page or one row. Precedence is:
- Request, authentication, transport, or decode errors: no report is produced;
the CLI exits
2. action_required: selected runs are missing, no contender results are found, benchmark results contain error payloads, or commit metadata is missing enough that a requested baseline cannot be resolved.failure: at least one row has a lookback z-score regression.skipped: results exist, but no row has computable lookback z-score analysis. Pairwise-only changes do not make a report pass or fail.success: at least one row has computable lookback z-score analysis and no higher-precedence condition applies.
compared counts rows where both sides were present and pairwise comparison was
attempted. analyzed counts rows with a computable lookback_z_score.
regressions and improvements count the lookback verdicts, not pairwise-only
threshold breaches.
Row Universe¶
A report row represents one contender result and, when available, the matching baseline result for the same history fingerprint. Rows are grouped under runs, but hardware is row-level because one run can contain results from multiple machines.
Row statuses mean:
| Status | Meaning |
|---|---|
regressed |
Lookback z-score crossed the regression threshold. |
improved |
Lookback z-score crossed the improvement threshold. |
stable |
Lookback z-score was computable and inside the threshold band. |
insufficient |
Pairwise comparison may exist, but there is not enough history for lookback analysis. |
errored |
The benchmark result contains an error payload. |
missing_baseline |
No matching baseline result was found. |
not_comparable |
The two results cannot be compared, for example because units differ. |
Exit Codes¶
0: report status issuccessorskipped.1: report status isfailureoraction_required.2: usage, authentication, server, or transport error.
Always publish the Markdown summary when exit code 1 is possible. That is the
case where the diagnostic is most useful.
Scheduled Alerts¶
Use conbench ci report for synchronous pull request diagnostics. For scheduled
monitoring, create alert rules through the API or account dashboard and run
conbench admin alerts evaluate from operations automation. The evaluator uses
the same CI report comparison semantics and records open/resolve events in
Conbench.
For scheduled notifications, run conbench admin alerts deliver with
CONBENCH_ALERT_WEBHOOK_URL or --webhook-url for a generic webhook, or with
--channel slack plus CONBENCH_ALERT_SLACK_WEBHOOK_URL or
--slack-webhook-url for Slack incoming-webhook delivery. For repository-scoped
scheduled GitHub Checks, use --channel github-check with
CONBENCH_ALERT_GITHUB_REPOSITORY and a token that can create Check Runs. For
repository-scoped commit comments, use --channel github-comment with the same
repository and token configuration. For email, use --channel email with
CONBENCH_ALERT_EMAIL_SMTP_ADDR, CONBENCH_ALERT_EMAIL_FROM, and
CONBENCH_ALERT_EMAIL_TO.
Delivery uses a durable outbox over stored alert events, so retries do not
duplicate already delivered events.
See Alerting for the canonical delivery model and current non-goals.