performance Archives | LMCache Blog

Nvidia Dynamo + LMCache: Accelerating the Future of LLM Inference

September 7, 2025

Best practices, Performance

collaboration, distributed-inference, dynamo, nvidia, performance

We’re thrilled to announce that the Nvidia Dynamo project has integrated LMCache as its KV caching layer solution. This is a big milestone: Dynamo gets a battle-tested caching solution, and LMCache becomes part of a production-scale ecosystem used by many developers worldwide. Why KV Caching Matters KV caching is a foundational optimization for modern LLM…

Read more: Nvidia Dynamo + LMCache: Accelerating the Future of LLM Inference
How LMCache Turbocharges Enterprise LLM Inference Frameworks

May 16, 2025

Performance

benchmark, ITL, lmcache, PD disagregation, performance, RAG, TTFT

TL;DR LMCache, the state-of-the-art KV cache layer library developed by TensorMesh and the project’s open-source community, delivers breakthrough performance improvements to modern enterprise LLM inference frameworks, including the vLLM Production Stack, KServe, and NVIDIA’s Dynamo. With fast and scalable caching of long-context KV cache, LMCache helps reduce inference costs and ensures SLOs for both latency…

Read more: How LMCache Turbocharges Enterprise LLM Inference Frameworks
High Performance and Easy Deployment of vLLM in K8S with “vLLM production-stack”

January 21, 2025

News

deployment, k8s, kubernetes, performance, production stack, vLLM

TL;DR The Context In the AI arms race, it’s no longer just about who has the best model—it’s about who has the best LLM serving system. vLLM has taken the open-source community by storm, with unparalleled hardware and model support plus an active ecosystem of top-notch contributors. But until now, vLLM has mostly focused on…

Read more: High Performance and Easy Deployment of vLLM in K8S with “vLLM production-stack”

About us

Tags

Tag: performance

Nvidia Dynamo + LMCache: Accelerating the Future of LLM Inference

How LMCache Turbocharges Enterprise LLM Inference Frameworks

High Performance and Easy Deployment of vLLM in K8S with “vLLM production-stack”