LMCache Blog

About us

Categories

Tags

Follow us on: X, LinkedIn

Initiated and Officially Supported by Tensormesh

Category: Benchmark

Bringing State-Of-The-Art PD Speed to vLLM v1 with LMCache

April 29, 2025

Benchmark

dynamo, lmcache, NIXL, PD disagregation

TL;DR:In our previous blog, we introduced **LMCache**’s integration with vLLM v1 and NVIDIA’s NIXL used in Dynamo, enabling Prefill-Decode Disaggregation (PD) for LLM inference. Today, we’re excited to share benchmark results that confirm this system achieves state-of-the-art PD performance, balancing time-to-first-token (TTFT) and inter-token latency (ITL) with unprecedented consistency. Here’s an example result (scroll down…

Read more: Bringing State-Of-The-Art PD Speed to vLLM v1 with LMCache
Open-Source LLM Inference Cluster Performing 10x FASTER than SOTA OSS Solution

March 6, 2025

Benchmark

k8s, kubernetes, production stack, QPS, router, TTFT, vLLM

A picture is worth a thousand words: Executive Summary: [vLLM Production Stack Github] | [Get In Touch] | [Slack] | [Linkedin] | [Twitter] Benchmark setups Methods: Workload: Inspired by our production deployments, we create workloads that emulate a typical chat-bot document analysis workload. By default, each LLM query input has 9K tokens, including a document…

Read more: Open-Source LLM Inference Cluster Performing 10x FASTER than SOTA OSS Solution