LMCache Blog

About us

Categories

Tags

Follow us on: X, LinkedIn

Tag: Google

LMCache on Google Kubernetes Engine: Boosting LLM Inference Performance with KV Cache on Tiered Storage

October 7, 2025

Benchmark

benchmark, gke, Google, storage, vLLM

Overview of the Collaboration The KV Cache is a memory optimization that makes Large Language Models(LLMs) run the forward pass faster by storing Key (K) and Value (V) matrices to prevent the model from recalculating them for the entire text sequence with every new generated token. Maximizing the KV Cache hit rate with storage is…

Read more: LMCache on Google Kubernetes Engine: Boosting LLM Inference Performance with KV Cache on Tiered Storage