Benchmark reports, compared the way you actually read them.
Aggregates run reports discovered across AgentKernelArena, typically under `workspace_*/run_*/reports`. Focus on total score, speedup distribution, task-type performance, and per-task status deltas.
Snapshot
Snapshot cards highlight the strongest total score, the steadiest metrics, and the current dataset coverage.
Filters
`Report` filters apply to every view. `Task type / status / search` mainly shape the task explorer below.
Report Ranking
This table keeps the full report-level metrics visible so you can quickly judge who is strongest, who is stable, and who fails more often.
| Report | Total score | Avg speedup | Median | Compile | Correct | > 1.0 | Sources |
|---|
Task Type Comparison
Task Spotlight
Based on the current filters, this highlights the most interesting tasks and the reports currently leading them.