kv cache Archives | LMCache Blog

CacheGen: Store Your KV Cache on Disk or S3—Load Blazingly Fast!

July 31, 2025

Tutorial

cachegen, kv cache, quantization, s3, storage

TL;DR: 🚀 CacheGen lets you store KV caches on disk or AWS S3 and load them way faster than recomputing! It compresses your KV cache up to 3× smaller than quantization so that you can load your KV cache blazingly fast while keeping response quality high. Stop wasting compute — use CacheGen to fully utilize…

Read more: CacheGen: Store Your KV Cache on Disk or S3—Load Blazingly Fast!
LMCache Extends Its Turbo-Boost to Multimodal Models in vLLM V1

July 3, 2025

New features, News

kv cache, lmcache, mm_hash, multimodal

TL;DR: The latest LMCache release plugs seamlessly into vLLM’s new multimodal stack. By hashing image-side tokens (mm_hashes) and caching their key-value (KV) pairs, LMCache reuses vision embeddings across requests—slashing time-to-first-token and GPU memory for visual-LLMs. Summary — Why This Matters Multimodal large language models (MLLMs) multiply KV-cache traffic: every image can add thousands of “vision…

Read more: LMCache Extends Its Turbo-Boost to Multimodal Models in vLLM V1
Shaping NIXL-based PD Disaggregation in vLLM V1

April 11, 2025

Tutorial

kv cache, NIXL, PD disagregation, prefill, vLLM

Highlights: Today, LMCache shares two key designs in LLM infrastructure for disaggregated prefill and more: Together, these updates mark a pivotal leap forward in PD disaggregation for vLLM, towards better system flexibility and multi-node scale-out capabilities. A high-level architecture diagram of “vLLM V1 + NIXL + LMCache” integration: vLLM V1 Gets a Major Upgrade with…

Read more: Shaping NIXL-based PD Disaggregation in vLLM V1

About us

Categories

Tags

Tag: kv cache

CacheGen: Store Your KV Cache on Disk or S3—Load Blazingly Fast!

LMCache Extends Its Turbo-Boost to Multimodal Models in vLLM V1

Shaping NIXL-based PD Disaggregation in vLLM V1