LMCache Blog | This is the blog of the LMCache community. It provides caching knowledge for your LLM , Accelerating and optimizing your GPU KVcache

LMCache Blog

About us

Categories

Tags

Follow us on: X, LinkedIn

Initiated and Officially Supported by Tensormesh

LMCache: Turboboosting vLLM with 7x faster access to 100x more KV caches

By
LMCache Team

Sep 17, 2024

News, Tutorial

LLM, lmcache, RAG, vLLM

TL;DR: LMCache turboboosts vLLM with 7× faster access to 100x more KV caches, for both multi-turn conversation and RAG . [💻 Source code] [📚 Paper1] [📚 Paper2] [🎬 3-minute introduction video] LLMs are ubiquitous across industries, but when using them with long documents, it takes forever for the model even to spit…
Read more: LMCache: Turboboosting vLLM with 7x faster access to 100x more KV caches