Arena Visual Dashboard

Benchmark reports, compared the way you actually read them.

Waiting for data

Aggregates run reports discovered across AgentKernelArena, typically under `workspace_*/run_*/reports`. Focus on total score, speedup distribution, task-type performance, and per-task status deltas.

Overview

Snapshot

Snapshot cards highlight the strongest total score, the steadiest metrics, and the current dataset coverage.

Controls

Filters

`Report` filters apply to every view. `Task type / status / search` mainly shape the task explorer below.

Reports
Task types
Status
Leaderboard

Report Ranking

This table keeps the full report-level metrics visible so you can quickly judge who is strongest, who is stable, and who fails more often.

Report Total score Avg speedup Median Compile Correct > 1.0 Sources
Heatmap

Task Type Comparison

Standouts

Task Spotlight

Based on the current filters, this highlights the most interesting tasks and the reports currently leading them.

Explorer

Task Matrix