SeqQA2
seqqa2
33 runs · 7 models · evaluated by HybridEvaluator.
Mode
| # | Model ↕ | Variant ↕ | Mode ↕ | Score ↓ | Avg. dur ↕ | Tokens ↕ | Date ↕ |
|---|---|---|---|---|---|---|---|
| 1 | gemini-3-pro-preview | tools,high | inject | 0.525 | 6.6m | 61.9M | 2026-01-27 |
| 2 | gemini-3-pro-preview | tools,high | file | 0.522 | 39.1m | 61.8M | 2026-01-24 |
| 3 | gpt-5-2 | tools,high | inject | 0.495 | 1.8m | 3.3M | 2026-01-25 |
| 4 | gemini-3-pro-preview | — | inject | 0.492 | 4.2m | 60.6M | 2026-01-24 |
| 5 | gemini-3-pro-preview | — | file | 0.487 | 4.2m | 60.6M | 2026-01-24 |
| 6 | gpt-5-2-pro | tools,high | inject | 0.463 | 14.2m | 4.8M | 2026-01-25 |
| 7 | claude-opus-4-5 | tools,high | file | 0.455 | 32.7s | 12.1M | 2026-03-22 |
| 8 | gpt-5-2-pro | — | inject | 0.445 | 4.4m | 1.2M | 2026-01-24 |
| 9 | gpt-5-2 | tools,high | file | 0.443 | 2.4m | 1.7M | 2026-01-28 |
| 10 | claude-opus-4-6 | tools,high | inject | 0.443 | 2.5m | 483.1M | 2026-03-23 |
| 11 | claude-opus-4-6 | tools,high | file | 0.440 | 52.6s | 17.7M | 2026-03-23 |
| 12 | claude-opus-4-5 | tools,high | inject | 0.427 | 57.7s | 17.6M | 2026-03-22 |
| 13 | claude-opus-4-6 | — | inject | 0.345 | 16.8s | 63.4M | 2026-03-20 |
| 14 | claude-opus-4-5 | — | inject | 0.307 | 8.8s | 342.8k | 2026-03-20 |
| 15 | claude-opus-4-5 | — | file | 0.270 | 9.3s | 541.0k | 2026-03-20 |
| 16 | claude-opus-4-6 | — | file | 0.260 | 17.4s | 56.7M | 2026-03-20 |
| 17 | gpt-5-2 | — | file | 0.255 | 1.2s | 163.6k | 2026-01-28 |
| 18 | gpt-5-2 | — | inject | 0.253 | 1.3s | 159.5k | 2026-01-28 |
| 19 | claude-opus-4-5_retry_retry | — | file | 0.159 | 9.0s | 28.4k | 2026-01-25 |
| 20 | gemini-3-pro-preview | — | retrieve | 0.120 | 2.7m | 98.6k | 2026-01-25 |
| 21 | gemini-3-pro-preview | tools,high | retrieve | 0.120 | 7.4m | 133.3k | 2026-01-25 |
| 22 | gpt-5-2-pro | tools,high | retrieve | 0.095 | 29.5m | 23.6M | 2026-01-27 |
| 23 | gpt-5-2 | — | retrieve | 0.095 | 4.5s | 28.0k | 2026-01-25 |
| 24 | gpt-5-2 | tools,high | retrieve | 0.080 | 11.7m | 27.2M | 2026-01-25 |
| 25 | gpt-5-2-pro | — | retrieve | 0.075 | 3.4m | 412.4k | 2026-01-25 |
| 26 | claude-opus-4-5_retry | — | file | 0.051 | 8.7s | 41.1k | 2026-01-25 |
| 27 | claude-opus-4-6 | — | retrieve | 0.030 | 5.3s | 55.8k | 2026-03-20 |
| 28 | claude-opus-4-5 | — | retrieve | 0.020 | 5.1s | 62.3k | 2026-03-20 |
| 29 | gpt-5-2-pro | tools,high_retry | inject | 0.009 | 19.3m | 30.0k | 2026-01-25 |
| 30 | gpt-5-2-pro | — | file | — | 0.0s | 0 | 2026-01-24 |
| 31 | gpt-5-2-pro | tools,high | file | — | 0.0s | 0 | 2026-01-25 |
| 32 | claude-opus-4-5 | tools,high | retrieve | — | 48.5s | 59.9M | 2026-03-22 |
| 33 | claude-opus-4-6 | tools,high | retrieve | — | 2.1m | 122.3M | 2026-03-23 |
Click column headers to sort. Click mode chips to filter.