AgentDish directory

llm-inference

Accepted listings with this tag.

Listing	Category	Score	Trend	Checked
#82 ↓ -3 tiny-vllm Open-source C++ and CUDA LLM inference engine inspired by vLLM, with a teaching-focused course that walks through model serving, batching, KV cache, and attention kernels.	Developer Tools / AI Inference / LLM Serving	88	↓ -3	21 days ago	Details
#147 ↓ -47 Benchmarking Inference Engines on Agentic Workloads A research article from Applied Compute on how agentic, tool-using workloads differ from traditional LLM benchmarks, with production observations, workload profiles, and an open-source harness for replaying traces.	Research / Knowledge Work	87	↓ -47	45 days ago	Details
#169 ↑ +2 ZSE v2.0.0 A pure-Python LLM inference engine and server with CUDA/HIP/Metal code generation, OpenAI-compatible API support, built-in RAG, and multi-GPU backend support.	Developer Tools / AI / ML Infrastructure	86	↑ +2	18 days ago	Details
#605 ↓ -127 Achieving 3X speedups on Google TPUs with diffusion-style speculative decoding Google Developers Blog post about integrating DFlash, a diffusion-style speculative decoding framework, into the vLLM TPU ecosystem to improve LLM serving speed on TPU v5p.	Developer Tools / Code Assistant	78	↓ -127	45 days ago	Details
#703 ↓ -20 vLLM-Compile A public slide deck about vLLM-compile, a project focused on bringing compiler optimizations to LLM inference and speeding up torch.compile for vLLM workflows.	Developer Tools / Code Assistant	72	↓ -20	45 days ago	Details