Edison Labs
Benchmarks / LabBench2

TableQA2 (image)

tableqa2-img

10 runs · 5 models · evaluated by HybridEvaluator.

# Model Variant Mode Score Avg. dur Tokens Date
1 claude-opus-4-5 tools,high file 0.950 10.0s 888.2k 2026-03-22
2 gemini-3-pro-preview tools,high file 0.950 22.2s 124.7k 2026-02-03
3 gpt-5-2 tools,high file 0.950 1.7m 1.8M 2026-02-03
4 gpt-5-2-pro file 0.940 43.9s 795.5k 2026-02-03
5 gpt-5-2-pro tools,high file 0.940 1.0m 1.4M 2026-02-03
6 claude-opus-4-6 tools,high file 0.930 10.2s 1.6M 2026-03-23
7 gemini-3-pro-preview file 0.930 21.4s 121.5k 2026-02-03
8 gpt-5-2 file 0.930 5.7s 801.5k 2026-02-03
9 claude-opus-4-5 file 0.920 5.6s 375.1k 2026-03-20
10 claude-opus-4-6 file 0.910 6.1s 382.0k 2026-03-20

Click column headers to sort. Click mode chips to filter.