DBQA2
dbqa2
10 runs · 5 models · evaluated by HybridEvaluator.
| # | Model ↕ | Variant ↕ | Mode ↕ | Score ↓ | Avg. dur ↕ | Tokens ↕ | Date ↕ |
|---|---|---|---|---|---|---|---|
| 1 | gemini-3-pro-preview | tools,high | inject | 0.453 | 3.1m | 51.3k | 2026-01-26 |
| 2 | gpt-5-2 | tools,high | inject | 0.302 | 6.4m | 8.3M | 2026-01-25 |
| 3 | claude-opus-4-6 | tools,high | inject | 0.279 | 3.6m | 75.9M | 2026-03-22 |
| 4 | claude-opus-4-5 | tools,high | inject | 0.198 | 1.4m | 31.5M | 2026-03-22 |
| 5 | gpt-5-2-pro | tools,high | inject | 0.105 | 14.6m | 7.7M | 2026-01-25 |
| 6 | claude-opus-4-6 | — | inject | 0.093 | 7.9s | 25.3k | 2026-03-20 |
| 7 | gemini-3-pro-preview | — | inject | 0.070 | 47.9s | 26.0k | 2026-01-22 |
| 8 | gpt-5-2-pro | — | inject | 0.070 | 1.3m | 86.1k | 2026-01-22 |
| 9 | gpt-5-2 | — | inject | 0.070 | 5.7s | 19.7k | 2026-01-22 |
| 10 | claude-opus-4-5 | — | inject | 0.058 | 6.7s | 25.5k | 2026-03-20 |
Click column headers to sort. Click mode chips to filter.