AgentDish directory

agent-evaluation

Accepted listings with this tag.

Listing	Category	Score	Trend	Checked
#117 ↓ -4 commensa-audit A local-first Python tool that audits Git history to quantify rework in AI-generated engineering work, including PR correction rates, churn clusters, superseded work, and line survival.	Developer Tools / AI Analytics	87	↓ -4	7 days ago	Details
#661 ↓ -1 Evaluate Your Agentic Tooling A blog post describing an evaluation harness for comparing agentic coding tools and prompts across realistic SWE tasks, with token-cost results and model-specific behavior notes.	Developer Tools / Code Assistant	74	↓ -1	6 days ago	Details