Edison Labs
Benchmarks / LabBench2

SeqQA2

seqqa2

33 runs · 7 models · evaluated by HybridEvaluator.

Mode
# Model Variant Mode Score Avg. dur Tokens Date
1 gemini-3-pro-preview tools,high inject 0.525 6.6m 61.9M 2026-01-27
2 gemini-3-pro-preview tools,high file 0.522 39.1m 61.8M 2026-01-24
3 gpt-5-2 tools,high inject 0.495 1.8m 3.3M 2026-01-25
4 gemini-3-pro-preview inject 0.492 4.2m 60.6M 2026-01-24
5 gemini-3-pro-preview file 0.487 4.2m 60.6M 2026-01-24
6 gpt-5-2-pro tools,high inject 0.463 14.2m 4.8M 2026-01-25
7 claude-opus-4-5 tools,high file 0.455 32.7s 12.1M 2026-03-22
8 gpt-5-2-pro inject 0.445 4.4m 1.2M 2026-01-24
9 gpt-5-2 tools,high file 0.443 2.4m 1.7M 2026-01-28
10 claude-opus-4-6 tools,high inject 0.443 2.5m 483.1M 2026-03-23
11 claude-opus-4-6 tools,high file 0.440 52.6s 17.7M 2026-03-23
12 claude-opus-4-5 tools,high inject 0.427 57.7s 17.6M 2026-03-22
13 claude-opus-4-6 inject 0.345 16.8s 63.4M 2026-03-20
14 claude-opus-4-5 inject 0.307 8.8s 342.8k 2026-03-20
15 claude-opus-4-5 file 0.270 9.3s 541.0k 2026-03-20
16 claude-opus-4-6 file 0.260 17.4s 56.7M 2026-03-20
17 gpt-5-2 file 0.255 1.2s 163.6k 2026-01-28
18 gpt-5-2 inject 0.253 1.3s 159.5k 2026-01-28
19 claude-opus-4-5_retry_retry file 0.159 9.0s 28.4k 2026-01-25
20 gemini-3-pro-preview retrieve 0.120 2.7m 98.6k 2026-01-25
21 gemini-3-pro-preview tools,high retrieve 0.120 7.4m 133.3k 2026-01-25
22 gpt-5-2-pro tools,high retrieve 0.095 29.5m 23.6M 2026-01-27
23 gpt-5-2 retrieve 0.095 4.5s 28.0k 2026-01-25
24 gpt-5-2 tools,high retrieve 0.080 11.7m 27.2M 2026-01-25
25 gpt-5-2-pro retrieve 0.075 3.4m 412.4k 2026-01-25
26 claude-opus-4-5_retry file 0.051 8.7s 41.1k 2026-01-25
27 claude-opus-4-6 retrieve 0.030 5.3s 55.8k 2026-03-20
28 claude-opus-4-5 retrieve 0.020 5.1s 62.3k 2026-03-20
29 gpt-5-2-pro tools,high_retry inject 0.009 19.3m 30.0k 2026-01-25
30 gpt-5-2-pro file 0.0s 0 2026-01-24
31 gpt-5-2-pro tools,high file 0.0s 0 2026-01-25
32 claude-opus-4-5 tools,high retrieve 48.5s 59.9M 2026-03-22
33 claude-opus-4-6 tools,high retrieve 2.1m 122.3M 2026-03-23

Click column headers to sort. Click mode chips to filter.