Edison Labs
Benchmarks / LabBench2

FigQA2 (image)

figqa2-img

10 runs · 5 models · evaluated by HybridEvaluator.

# Model Variant Mode Score Avg. dur Tokens Date
1 gpt-5-2 tools,high file 0.663 8.1m 3.4M 2026-02-03
2 gpt-5-2-pro file 0.614 1.8m 1.1M 2026-02-03
3 gemini-3-pro-preview file 0.604 1.8m 143.9k 2026-02-03
4 gpt-5-2-pro tools,high file 0.594 2.8m 1.8M 2026-02-03
5 gpt-5-2 file 0.564 12.2s 1.1M 2026-02-03
6 gemini-3-pro-preview tools,high file 0.545 3.9m 729.1k 2026-02-03
7 claude-opus-4-5 tools,high file 0.446 21.6s 1.6M 2026-03-22
8 claude-opus-4-5 file 0.426 8.0s 526.8k 2026-03-20
9 claude-opus-4-6 file 0.416 8.6s 531.4k 2026-03-20
10 claude-opus-4-6 tools,high file 0.406 41.0s 6.6M 2026-03-22

Click column headers to sort. Click mode chips to filter.