Developer Tools / AI Inference / LLM Serving

tiny-vllm

Open-source C++ and CUDA LLM inference engine inspired by vLLM, with a teaching-focused course that walks through model serving, batching, KV cache, and attention kernels.

Clear27/30
Useful28/30
Specific16/20
Complete17/20
tiny-vllm screenshot

Why it was accepted

The page clearly describes an AI infrastructure project: a high-performance LLM inference engine in C++ and CUDA. The README gives concrete implementation details, mentions Safetensors and Llama 3.2 1B Instruct support, and lists engine features like KV cache, static and continuous batching, online softmax, and PagedAttention. It is useful for AI builders and readers who want both code and an educational walkthrough.

Weakness

The snapshot does not show installation prerequisites, runtime examples, benchmark results, API usage, or maintenance signals beyond the file list and commit count. A visitor still cannot tell how to run the server or what interfaces it exposes.

Review status

21 days ago #82 ↓ -3

Last evaluated 21 days ago. Current rank #82. Down 3 spots in the rankings.

Score history

88

Related listings

CodeGraph screenshot
94

Developer Tools / AI for Code

CodeGraph is a local code knowledge graph for AI coding agents like Claude Code, Cursor, Codex, OpenCode, and Hermes Agent. It aims to cut token use, tool calls, and runtime by letting agents query pre-indexed code structure instead of scanning files repeatedly.

LLMRender screenshot
92

Developer Tools / React Libraries

A lightweight React Markdown renderer with built-in LaTeX, syntax highlighting, streaming-safe rendering, and security-focused defaults.

Version Sentinel screenshot

Developer Tools / AI Coding Guardrails

Claude Code plugin that blocks dependency edits until a fresh, source-cited version check is recorded, helping prevent hallucinated or stale package versions across npm, pip, Poetry/uv, Cargo, and NuGet.

Omni screenshot
#7 Omni
91

Developer Tools / Search & Retrieval

Omni is a local-first semantic search app for macOS that indexes text, code, PDFs, images, audio, and video on-device. It supports multilingual search, private offline use, and exposes a local endpoint for agents to query indexed files.