AI Research / LLM Evaluation & Analysis

A 400-hour forensic audit of LLMs using multi-model context saturation

A GitHub research project documenting a long-form, multi-model analysis of LLM behavior across Claude, Gemini, ChatGPT, and Grok. The repo includes an executive summary, screenplay, technical white paper, and archive of logs and chat records.

Clear22/30
Useful20/30
Specific18/20
Complete15/20
A 400-hour forensic audit of LLMs using multi-model context saturation screenshot

Why it was accepted

The page clearly presents an AI-related research project with a defined methodology, named models, and multiple visible artifacts beyond a landing page. It offers enough evidence for a public directory listing focused on LLM evaluation and behavioral analysis.

Weakness

The crawl does not show the actual white paper content, the experiment setup in detail, or whether the Google Drive archive is publicly accessible. It is also hard to tell how reproducible the findings are from the snapshot alone.

Review status

25 days ago #656 → 0

Last evaluated 25 days ago. Current rank #656. Holding steady in the rankings.

Score history

75

Related listings

EuroMesh screenshot
#495 EuroMesh
81

AI Research / Analysis / Reports

A sourced model and short report exploring whether Europe could train a sovereign frontier AI model using public compute it already owns, with reproducible code, datasets, and a PDF report.

MarCognity-AI screenshot
81

AI Research / Evaluation / Verification Framework

An open-source research framework for structured LLM evaluation, claim verification, and source-grounded reflective reasoning. The repo describes modular components for retrieval, semantic scoring, skeptical claim checking, and benchmark-style epistemic assessment.

MiroThinker screenshot
78

AI Research / Deep Research Agent

MiroThinker is a science-focused AI research app that emphasizes prediction, verification, and evidence-backed answers. The page also points to a MiroMind app and suggests use cases across finance, medicine, and regulation.