Follow us on: X, LinkedIn
Initiated and Officially Supported by Tensormesh
Break News: “CacheBlend” Receives BEST PAPER AWARD at ACM EuroSys 2025 This week, at ACM EuroSys 2025 (Top Academic Conference in Computer Systems), Jiayi Yao, the first author of the groundbreaking paper on CacheBlend, will present our innovative work that redefines the landscape of LLM efficiency, particularly in retrieval-augmented generation (RAG) applications. This paper has…

TL;DR The Context In the AI arms race, it’s no longer just about who has the best model—it’s about who has the best LLM serving system. vLLM has taken the open-source community by storm, with unparalleled hardware and model support plus an active ecosystem of top-notch contributors. But until now, vLLM has mostly focused on…

🚀 Building your system on KV cache? Try building it on LMCache! By using LMCache for your research, you can focus on KV cache management, while we handle all the vLLM integration and compatibility for you. Here’s why LMCache works as your research testbed: Check our codebase and documentations for more information! Update: we are…

We’re excited to announce that our LMCache documentation is now live! 🎉 This documentation website to help you get started quickly and understand all the key features. Here’s what you’ll find: Our documentation is designed for both beginners and experienced developers who want to optimize LLM inference and explore cutting-edge techniques. Check out the documentation…

TL;DR: LMCache turboboosts vLLM with 7× faster access to 100x more KV caches, for both multi-turn conversation and RAG . [💻 Source code] [📚 Paper1] [📚 Paper2] [🎬 3-minute introduction video] LLMs are ubiquitous across industries, but when using them with long documents, it takes forever for the model even to spit…
