Edison Labs
Benchmarks / LabBench2

FigQA2

figqa2

10 runs · 5 models · evaluated by HybridEvaluator.

# Model Variant Mode Score Avg. dur Tokens Date
1 gpt-5-2 tools,high inject 0.426 4.0m 6.2M 2026-02-03
2 gpt-5-2-pro tools,high inject 0.347 8.0m 5.8M 2026-02-03
3 claude-opus-4-6 tools,high inject 0.287 4.0m 60.9M 2026-03-22
4 claude-opus-4-5 tools,high inject 0.257 1.4m 31.6M 2026-03-22
5 gemini-3-pro-preview tools,high inject 0.238 3.4m 53.0k 2026-02-03
6 gpt-5-2-pro inject 0.119 1.6m 116.9k 2026-02-03
7 gpt-5-2 inject 0.119 2.7s 15.2k 2026-02-03
8 claude-opus-4-6 inject 0.109 9.0s 41.0k 2026-03-20
9 gemini-3-pro-preview inject 0.109 2.0m 35.0k 2026-02-03
10 claude-opus-4-5 inject 0.079 7.1s 36.4k 2026-03-20

Click column headers to sort. Click mode chips to filter.