LMCache Blog

About us

Categories

Tags

Follow us on: X, LinkedIn

Initiated and Officially Supported by Tensormesh

Tag: NIXL

Bringing State-Of-The-Art PD Speed to vLLM v1 with LMCache

April 29, 2025

Benchmark

dynamo, lmcache, NIXL, PD disagregation

TL;DR:In our previous blog, we introduced **LMCache**’s integration with vLLM v1 and NVIDIA’s NIXL used in Dynamo, enabling Prefill-Decode Disaggregation (PD) for LLM inference. Today, we’re excited to share benchmark results that confirm this system achieves state-of-the-art PD performance, balancing time-to-first-token (TTFT) and inter-token latency (ITL) with unprecedented consistency. Here’s an example result (scroll down…

Read more: Bringing State-Of-The-Art PD Speed to vLLM v1 with LMCache
Shaping NIXL-based PD Disaggregation in vLLM V1

April 11, 2025

Tutorial

kv cache, NIXL, PD disagregation, prefill, vLLM

Highlights: Today, LMCache shares two key designs in LLM infrastructure for disaggregated prefill and more: Together, these updates mark a pivotal leap forward in PD disaggregation for vLLM, towards better system flexibility and multi-node scale-out capabilities. A high-level architecture diagram of “vLLM V1 + NIXL + LMCache” integration: vLLM V1 Gets a Major Upgrade with…

Read more: Shaping NIXL-based PD Disaggregation in vLLM V1