LMCache | LMCache blog website

Speeding Up LLM Inference: Beyond the Inference Engine

By Junchen Jiang, Hanchen Li, Jake Sonsini

Posted on July 21, 2025

TL;DR: LLMs are rapidly becoming the dominant workload in enterprise AI. As more applications rely on real-time generation, inference performance — measured in speed, cost, and reliability — becomes the key bottleneck. Today, the industry focuses primarily on speeding up inference engines like vLLM, SGLang, and TensorRT. But in doing... [Read More]

LMCache Extends Its Turbo-Boost to Multimodal Models in vLLM V1

By LMCache Team

Posted on July 3, 2025

TL;DR: The latest LMCache release plugs seamlessly into vLLM’s new multimodal stack. By hashing image-side tokens (mm_hashes) and caching their key-value (KV) pairs, LMCache reuses vision embeddings across requests—slashing time-to-first-token and GPU memory for visual-LLMs. [Read More]

LLM Production Stack Goes Cross-Hardware: Ascend, Arm, and AMD Support Incoming

By LMCache Team

Posted on June 20, 2025

TL;DR: Our LLM Production Stack project just hit another milestone. We’re integrating with more hardware accelerators — including Ascend, Arm, and AMD — signaling growing maturity and broader applicability across enterprise and research settings. [Read More]

LMCache Announces Exciting Collaboration with Red Hat, with LMCache Serving as a Founding Supporter of the llm-d project

By RedHat and LMCache Team

Posted on May 22, 2025

We’re delighted to announce that LMCache is joining forces with Red Hat and other industry leaders on some exciting open source project collaborations. LMCache has been selected to be a core component of llm-d, a new open source project led by Red Hat to drive more scalable, efficient distributed inferencing... [Read More]

How LMCache Turbocharges Enterprise LLM Inference Frameworks

By LMCache Team

Posted on May 16, 2025

TL;DR [Read More]