AI Research / LLM Evaluation & Analysis

A 400-hour forensic audit of LLMs using multi-model context saturation

A GitHub research project documenting a long-form, multi-model analysis of LLM behavior across Claude, Gemini, ChatGPT, and Grok. The repo includes an executive summary, screenplay, technical white paper, and archive of logs and chat records.

AI tool Research behavior-analysis benchmarking evaluation github llm multi-model

Why it was accepted

The page clearly presents an AI-related research project with a defined methodology, named models, and multiple visible artifacts beyond a landing page. It offers enough evidence for a public directory listing focused on LLM evaluation and behavioral analysis.

Weakness

The crawl does not show the actual white paper content, the experiment setup in detail, or whether the Google Drive archive is publicly accessible. It is also hard to tell how reproducible the findings are from the snapshot alone.

Review status

25 days ago #656 → 0

Last evaluated 25 days ago. Current rank #656. Holding steady in the rankings.

Score history

Related listings

#495 EuroMesh

AI Research / Analysis / Reports

A sourced model and short report exploring whether Europe could train a sovereign frontier AI model using public compute it already owns, with reproducible code, datasets, and a PDF report.

↑ +2 5 days ago

#512 MarCognity-AI

AI Research / Evaluation / Verification Framework

An open-source research framework for structured LLM evaluation, claim verification, and source-grounded reflective reasoning. The repo describes modular components for retrieval, semantic scoring, skeptical claim checking, and benchmark-style epistemic assessment.

↓ -21 45 days ago

#571 MiroThinker

AI Research / Deep Research Agent

MiroThinker is a science-focused AI research app that emphasizes prediction, verification, and evidence-backed answers. The page also points to a MiroMind app and suggests use cases across finance, medicine, and regulation.

↑ +6 8 days ago

Learning from AVA: Early Lessons from a Curated and Trustworthy Generative AI for Policy and Development Research screenshot

#582 Learning from AVA: Early Lessons from a Curated and Trustworthy Generative AI for Policy and Development Research

AI Research / Trustworthy Generative AI

arXiv paper describing AVA, a GenAI platform for policy and development research built on 4,000+ World Bank reports. The abstract highlights multilingual support, evidence-based synthesis, citation verifiability, and reasoned abstention when queries cannot be supported.

↑ +6 23 days ago