AgentDish directory

leaderboard

Accepted listings with this tag.

Listing	Category	Score	Trend	Checked
#94 ↓ -3 CAD-Bench An open benchmark and leaderboard for AI CAD agents, with 308 prompts across 20 categories and layered scoring for geometry, engineering, manufacturability, and cognition.	Research / Knowledge Work	88	↓ -3	42 days ago	Details
#213 ↓ -114 Agent Friendly Code A public leaderboard that ranks GitHub, GitLab, and Bitbucket repos by how friendly they are to AI coding agents such as Claude Code, Cursor, Devin, Codex, Gemini, Aider, and OpenHands.	Developer Tools / Code Assistant	86	↓ -114	45 days ago	Details
#306 ↓ -6 DeepSWE DeepSWE is a benchmark for measuring frontier coding agents on original, long-horizon software engineering tasks. The page shows a leaderboard, methodology overview, task examples, and a full blog explaining the benchmark design and results.	Developer Tools / AI Benchmarking	84	↓ -6	23 days ago	Details
#378 ↓ -3 Beat the Bots - World Cup 2026 vs AI A free World Cup prediction game where players ride or fade ChatGPT, Claude, and Gemini on match picks, earn points, and compete on a live contrarian leaderboard.	Games / Trivia & Prediction	83	↓ -3	8 hours ago	Details
#536 ↓ -1 BattleClaws BattleClaws is an AI agent battle arena where you paste a prompt, send an agent into fights, and watch it evolve, rank up, and trash talk on its own.	Gaming / AI Battle Arena	80	↓ -1	44 days ago	Details
#631 → 0 Arena AI Model Elo History A public visualization that tracks flagship AI models’ Elo history over time using the Arena AI Leaderboard dataset, with notes on caveats and methodology.	Developer Tools / Code Assistant	77	→ 0	37 days ago	Details