Edison Labs
Benchmarks / LabBench2

TableQA2 (pdf)

tableqa2-pdf

10 runs · 5 models · evaluated by HybridEvaluator.

# Model Variant Mode Score Avg. dur Tokens Date
1 claude-opus-4-6 tools,high file 0.880 46.3s 13.7M 2026-03-23
2 gemini-3-pro-preview file 0.880 41.8s 946.0k 2026-02-03
3 gemini-3-pro-preview tools,high file 0.880 53.3s 957.3k 2026-02-03
4 claude-opus-4-5 tools,high file 0.850 29.6s 7.9M 2026-03-22
5 gpt-5-2 tools,high file 0.840 4.1m 8.6M 2026-02-03
6 gpt-5-2-pro file 0.820 2.4m 9.9M 2026-02-03
7 claude-opus-4-6 file 0.810 38.2s 16.8M 2026-03-20
8 gpt-5-2-pro tools,high file 0.800 4.0m 9.0M 2026-02-03
9 gpt-5-2 file 0.760 1.9m 10.0M 2026-02-03
10 claude-opus-4-5 file 0.750 21.9s 8.4M 2026-03-20

Click column headers to sort. Click mode chips to filter.