Developer Tools / Copywriting

Agent Eval

A GitHub repo for evaluating agentic AI pipeline systems, with guidance for defining metrics, building eval cases, running repeatable tests, and tracking regressions.

Clear25/30
Useful26/30
Specific12/20
Complete14/20
Agent Eval screenshot

Why it was accepted

The page clearly presents an AI-adjacent developer tool: a skill for evaluating agentic AI pipeline systems. The README explains the purpose, what it helps measure, and how it can be installed, giving enough evidence for a useful directory listing.

Weakness

The snapshot does not show the actual evaluation workflow in detail, concrete examples of eval cases, or how results are reported beyond a high-level description. Maintenance signals are also thin, with no releases shown.

Review status

45 days ago #634 ↑ +48

Last evaluated 45 days ago. Current rank #634. Up 48 spots in the rankings.

Score history

7477

Related listings

CodeGraph screenshot
94

Developer Tools / AI for Code

CodeGraph is a local code knowledge graph for AI coding agents like Claude Code, Cursor, Codex, OpenCode, and Hermes Agent. It aims to cut token use, tool calls, and runtime by letting agents query pre-indexed code structure instead of scanning files repeatedly.

LLMRender screenshot
92

Developer Tools / React Libraries

A lightweight React Markdown renderer with built-in LaTeX, syntax highlighting, streaming-safe rendering, and security-focused defaults.

Version Sentinel screenshot

Developer Tools / AI Coding Guardrails

Claude Code plugin that blocks dependency edits until a fresh, source-cited version check is recorded, helping prevent hallucinated or stale package versions across npm, pip, Poetry/uv, Cargo, and NuGet.

Omni screenshot
#7 Omni
91

Developer Tools / Search & Retrieval

Omni is a local-first semantic search app for macOS that indexes text, code, PDFs, images, audio, and video on-device. It supports multilingual search, private offline use, and exposes a local endpoint for agents to query indexed files.