News Archives | LMCache Blog

LMCache x Ascend: Accelerating LLM inference on Ascend NPUs

November 4, 2025

ascend, lmcache, News

Supporting Ascend NPUs We’re delighted to announce that LMCache now officially supports Ascend NPUs with the release of the LMCache-Ascend plugin. LMCache-Ascend supports a broad range of Ascend compute platforms from the cloud to the edge. This major platform expansion underscores LMCache’s commitment to delivering leading performance across a diverse hardware ecosystem, enabling developers to…

Read more: LMCache x Ascend: Accelerating LLM inference on Ascend NPUs
Tensormesh unveiled and LMCache joins the PyTorch Foundation

October 31, 2025

News

lmcache, pytorch, tensormesh

Announcing Tensormesh First I wanted to repeat here what I posted on the LMCache #general Slack channel last week: I am delighted to announce that the team that founded the LMCache project has decided to form a company, Tensormesh, a few months ago. As we are announcing the beta of our first product, we have…

Read more: Tensormesh unveiled and LMCache joins the PyTorch Foundation
NVIDIA Dynamo integrates LMCache, Accelerating LLM Inference

September 18, 2025

News

dynamo, lmcache, nvidia, vLLM

We’re thrilled to announce that Nvidia Dynamo has integrated LMCache as a KV caching layer solution. This is a big milestone: Dynamo gets a battle-tested caching solution, and LMCache becomes part of a data center-scale inference platform used by many developers worldwide to deploy AI at scale. For comprehensive details about Dynamo’s KV cache optimization…

Read more: NVIDIA Dynamo integrates LMCache, Accelerating LLM Inference
🎉 LMCache Hits 5,000+ GitHub Stars — Thank You, Community!

August 28, 2025

News

community, github, lmcache, milestone, stars

We’re thrilled to share that LMCache has officially crossed 5,000 GitHub stars! 🚀 This milestone is not just a number — it’s a strong signal that KV cache technology has become a first-class citizen in the LLM inference stack, and that our community is leading the way. What is LMCache? LMCache is the first open-source…

Read more: 🎉 LMCache Hits 5,000+ GitHub Stars — Thank You, Community!
LMCache supports gpt-oss (20B/120B) on Day 1

August 5, 2025

Benchmark, Best practices, News

benchmark, gpt-oss, OpenAI, vLLM

LMCache now supports OpenAI’s newly released GPT-OSS models (20B and 120B parameters) from day one! This post provides a complete guide to setting up vLLM with LMCache for GPT-OSS models and demonstrates significant performance improvements through our CPU offloading capabilities. Step 1: Installing vLLM GPT OSS Version Installation Test the Installation Step 2: Install LMCache…

Read more: LMCache supports gpt-oss (20B/120B) on Day 1
LMIgnite: Fastest LLM Inference for Conversational and Long-Document AI, Only One Click Away

July 22, 2025

New features, News

LLM, lmcache, LMIginte, one click, production stack

TL;DR: LLMs are transforming every product and service—from chatbots and copilots to intelligent document search and enterprise workflows. But running LLMs in production is still painfully slow, prohibitively expensive, and complex to manage. That changes today. We’re excited to announce the launch of LMIgnite — the first one-click deployable high-performance LLM inference backend for Conversational…

Read more: LMIgnite: Fastest LLM Inference for Conversational and Long-Document AI, Only One Click Away
LMCache Extends Its Turbo-Boost to Multimodal Models in vLLM V1

July 3, 2025

New features, News

kv cache, lmcache, mm_hash, multimodal

TL;DR: The latest LMCache release plugs seamlessly into vLLM’s new multimodal stack. By hashing image-side tokens (mm_hashes) and caching their key-value (KV) pairs, LMCache reuses vision embeddings across requests—slashing time-to-first-token and GPU memory for visual-LLMs. Summary — Why This Matters Multimodal large language models (MLLMs) multiply KV-cache traffic: every image can add thousands of “vision…

Read more: LMCache Extends Its Turbo-Boost to Multimodal Models in vLLM V1
LLM Production Stack Goes Cross-Hardware: Ascend, Arm, and AMD Support Incoming

June 20, 2025

New features, News

AMD, Arm, Ascend, CUDA, kernel, lmcache, production stack, pytorch, TPU

TL;DR: Our LLM Production Stack project just hit another milestone. We’re integrating with more hardware accelerators — including Ascend, Arm, and AMD — signaling growing maturity and broader applicability across enterprise and research settings. 🚀 LMCache Is Gaining Traction LMCache has quietly become the unsung hero in the LLM inference world. As a core component…

Read more: LLM Production Stack Goes Cross-Hardware: Ascend, Arm, and AMD Support Incoming
LMCache Announces Exciting Collaboration with Red Hat, with LMCache Serving as a Founding Supporter of the llm-d project

May 22, 2025

News

llm-d, lmcache, Red Hat

We’re delighted to announce that LMCache is joining forces with Red Hat and other industry leaders on some exciting open source project collaborations. LMCache has been selected to be a core component of llm-d, a new open source project led by Red Hat to drive more scalable, efficient distributed inferencing across clusters of vLLM servers…

Read more: LMCache Announces Exciting Collaboration with Red Hat, with LMCache Serving as a Founding Supporter of the llm-d project
LMCache x Mooncake: Unite to Pioneer KVCache-Centric LLM Serving System

May 8, 2025

News

collaboration, lmcache, mooncake, storage

Overview of the Collaboration LMCache and Mooncake have announced a strategic collaboration aimed at pioneering a KVCache-centric Large Language Model (LLM) serving system. This partnership seeks to significantly enhance the efficiency, scalability, and responsiveness of LLM applications. By combining LMCache’s advanced KVCache management techniques with Mooncake’s powerful and optimized backend infrastructure, the collaboration aims to…

Read more: LMCache x Mooncake: Unite to Pioneer KVCache-Centric LLM Serving System