AgentDish directory

agent-evaluation

Accepted listings with this tag.

Listing Category Score Trend Checked
#117 ↓ -4
commensa-audit

A local-first Python tool that audits Git history to quantify rework in AI-generated engineering work, including PR correction rates, churn clusters, superseded work, and line survival.

Developer Tools / AI Analytics 87 ↓ -4 7 days ago Details

A blog post describing an evaluation harness for comparing agentic coding tools and prompts across realistic SWE tasks, with token-cost results and model-specific behavior notes.

Developer Tools / Code Assistant 74 ↓ -1 6 days ago Details