benchmark Archives | LMCache Blog

Breaking the Memory Barrier: How LMCache and CoreWeave Power Efficient LLM Inference for Cohere

October 29, 2025

Benchmark, Performance

benchmark, CAIOS, cohere, coreweave, RAG, storage, tensormesh

The challenge: Scaling enterprise AI Enterprises today are racing to integrate large language models (LLMs) into their products and workflows, but doing it at scale brings challenges in performance, cost, and accuracy. Organizations need models to be based on their specific data, while making sure that this information remains private. Cohere, one of the leading…

Read more: Breaking the Memory Barrier: How LMCache and CoreWeave Power Efficient LLM Inference for Cohere
LMCache on Google Kubernetes Engine: Boosting LLM Inference Performance with KV Cache on Tiered Storage

October 7, 2025

Benchmark

benchmark, gke, Google, storage, vLLM

Overview of the Collaboration The KV Cache is a memory optimization that makes Large Language Models(LLMs) run the forward pass faster by storing Key (K) and Value (V) matrices to prevent the model from recalculating them for the entire text sequence with every new generated token. Maximizing the KV Cache hit rate with storage is…

Read more: LMCache on Google Kubernetes Engine: Boosting LLM Inference Performance with KV Cache on Tiered Storage
LMCache supports gpt-oss (20B/120B) on Day 1

August 5, 2025

Benchmark, Best practices, News

benchmark, gpt-oss, OpenAI, vLLM

LMCache now supports OpenAI’s newly released GPT-OSS models (20B and 120B parameters) from day one! This post provides a complete guide to setting up vLLM with LMCache for GPT-OSS models and demonstrates significant performance improvements through our CPU offloading capabilities. Step 1: Installing vLLM GPT OSS Version Installation Test the Installation Step 2: Install LMCache…

Read more: LMCache supports gpt-oss (20B/120B) on Day 1
How LMCache Turbocharges Enterprise LLM Inference Frameworks

May 16, 2025

Performance

benchmark, ITL, lmcache, PD disagregation, performance, RAG, TTFT

TL;DR LMCache, the state-of-the-art KV cache layer library developed by TensorMesh and the project’s open-source community, delivers breakthrough performance improvements to modern enterprise LLM inference frameworks, including the vLLM Production Stack, KServe, and NVIDIA’s Dynamo. With fast and scalable caching of long-context KV cache, LMCache helps reduce inference costs and ensures SLOs for both latency…

Read more: How LMCache Turbocharges Enterprise LLM Inference Frameworks

About us

Categories

Tags

Tag: benchmark

Breaking the Memory Barrier: How LMCache and CoreWeave Power Efficient LLM Inference for Cohere

LMCache on Google Kubernetes Engine: Boosting LLM Inference Performance with KV Cache on Tiered Storage

LMCache supports gpt-oss (20B/120B) on Day 1

How LMCache Turbocharges Enterprise LLM Inference Frameworks