LMCache Blog

About us

Categories

Tags

Follow us on: X, LinkedIn

Initiated and Officially Supported by Tensormesh

Tag: llm-d

Speeding Up LLM Inference: Beyond the Inference Engine

July 21, 2025

Best practices, New features

AIBrix, dynamo, inference engines, kserve, kubernetes, llm-d, lmignite, Modula, orchestration, orchestrator, production stack, scale, SGLang OME

TL;DR: LLMs are rapidly becoming the dominant workload in enterprise AI. As more applications rely on real-time generation, inference performance — measured in speed, cost, and reliability — becomes the key bottleneck. Today, the industry focuses primarily on speeding up inference engines like vLLM, SGLang, and TensorRT. But in doing so, we’re overlooking a much…

Read more: Speeding Up LLM Inference: Beyond the Inference Engine
LMCache Announces Exciting Collaboration with Red Hat, with LMCache Serving as a Founding Supporter of the llm-d project

May 22, 2025

News

llm-d, lmcache, Red Hat

We’re delighted to announce that LMCache is joining forces with Red Hat and other industry leaders on some exciting open source project collaborations. LMCache has been selected to be a core component of llm-d, a new open source project led by Red Hat to drive more scalable, efficient distributed inferencing across clusters of vLLM servers…

Read more: LMCache Announces Exciting Collaboration with Red Hat, with LMCache Serving as a Founding Supporter of the llm-d project