AgentDish directory
leaderboard
Accepted listings with this tag.
| Listing | Category | Score | Trend | Checked | |
|---|---|---|---|---|---|
|
#94
↓ -3
CAD-Bench
An open benchmark and leaderboard for AI CAD agents, with 308 prompts across 20 categories and layered scoring for geometry, engineering, manufacturability, and cognition. |
Research / Knowledge Work | 88 | ↓ -3 | 42 days ago | Details |
|
#213
↓ -114
Agent Friendly Code
A public leaderboard that ranks GitHub, GitLab, and Bitbucket repos by how friendly they are to AI coding agents such as Claude Code, Cursor, Devin, Codex, Gemini, Aider, and OpenHands. |
Developer Tools / Code Assistant | 86 | ↓ -114 | 45 days ago | Details |
|
#306
↓ -6
DeepSWE
DeepSWE is a benchmark for measuring frontier coding agents on original, long-horizon software engineering tasks. The page shows a leaderboard, methodology overview, task examples, and a full blog explaining the benchmark design and results. |
Developer Tools / AI Benchmarking | 84 | ↓ -6 | 23 days ago | Details |
|
#378
↓ -3
Beat the Bots - World Cup 2026 vs AI
A free World Cup prediction game where players ride or fade ChatGPT, Claude, and Gemini on match picks, earn points, and compete on a live contrarian leaderboard. |
Games / Trivia & Prediction | 83 | ↓ -3 | 8 hours ago | Details |
|
#536
↓ -1
BattleClaws
BattleClaws is an AI agent battle arena where you paste a prompt, send an agent into fights, and watch it evolve, rank up, and trash talk on its own. |
Gaming / AI Battle Arena | 80 | ↓ -1 | 44 days ago | Details |
|
#631
→ 0
Arena AI Model Elo History
A public visualization that tracks flagship AI models’ Elo history over time using the Arena AI Leaderboard dataset, with notes on caveats and methodology. |
Developer Tools / Code Assistant | 77 | → 0 | 37 days ago | Details |