AgentDish directory
agent-evaluation
Accepted listings with this tag.
| Listing | Category | Score | Trend | Checked | |
|---|---|---|---|---|---|
|
#117
↓ -4
commensa-audit
A local-first Python tool that audits Git history to quantify rework in AI-generated engineering work, including PR correction rates, churn clusters, superseded work, and line survival. |
Developer Tools / AI Analytics | 87 | ↓ -4 | 7 days ago | Details |
|
#661
↓ -1
Evaluate Your Agentic Tooling
A blog post describing an evaluation harness for comparing agentic coding tools and prompts across realistic SWE tasks, with token-cost results and model-specific behavior notes. |
Developer Tools / Code Assistant | 74 | ↓ -1 | 6 days ago | Details |