Edison Labs
Benchmarks / LabBench2

DBQA2

dbqa2

10 runs · 5 models · evaluated by HybridEvaluator.

# Model Variant Mode Score Avg. dur Tokens Date
1 gemini-3-pro-preview tools,high inject 0.453 3.1m 51.3k 2026-01-26
2 gpt-5-2 tools,high inject 0.302 6.4m 8.3M 2026-01-25
3 claude-opus-4-6 tools,high inject 0.279 3.6m 75.9M 2026-03-22
4 claude-opus-4-5 tools,high inject 0.198 1.4m 31.5M 2026-03-22
5 gpt-5-2-pro tools,high inject 0.105 14.6m 7.7M 2026-01-25
6 claude-opus-4-6 inject 0.093 7.9s 25.3k 2026-03-20
7 gemini-3-pro-preview inject 0.070 47.9s 26.0k 2026-01-22
8 gpt-5-2-pro inject 0.070 1.3m 86.1k 2026-01-22
9 gpt-5-2 inject 0.070 5.7s 19.7k 2026-01-22
10 claude-opus-4-5 inject 0.058 6.7s 25.5k 2026-03-20

Click column headers to sort. Click mode chips to filter.