storage Archives | LMCache Blog

Breaking the Memory Barrier: How LMCache and CoreWeave Power Efficient LLM Inference for Cohere

October 29, 2025

Benchmark, Performance

benchmark, CAIOS, cohere, coreweave, RAG, storage, tensormesh

The challenge: Scaling enterprise AI Enterprises today are racing to integrate large language models (LLMs) into their products and workflows, but doing it at scale brings challenges in performance, cost, and accuracy. Organizations need models to be based on their specific data, while making sure that this information remains private. Cohere, one of the leading…

Read more: Breaking the Memory Barrier: How LMCache and CoreWeave Power Efficient LLM Inference for Cohere
LMCache on Google Kubernetes Engine: Boosting LLM Inference Performance with KV Cache on Tiered Storage

October 7, 2025

Benchmark

benchmark, gke, Google, storage, vLLM

Overview of the Collaboration The KV Cache is a memory optimization that makes Large Language Models(LLMs) run the forward pass faster by storing Key (K) and Value (V) matrices to prevent the model from recalculating them for the entire text sequence with every new generated token. Maximizing the KV Cache hit rate with storage is…

Read more: LMCache on Google Kubernetes Engine: Boosting LLM Inference Performance with KV Cache on Tiered Storage
Extending LMCache Backends: A Comprehensive Guide to Custom Backend Development

September 11, 2025

Tutorial

backend, customization, extension, lmcache, storage

In large language model inference scenarios, the performance and flexibility of KVCache caching systems directly impact overall service efficiency. LMCache, as a high-performance large model caching framework, provides developers with rich extension capabilities through its modular backend design. This article will start with LMCache backend’s extension mechanism, using the officially provided lmc_external_log_backend as an example,…

Read more: Extending LMCache Backends: A Comprehensive Guide to Custom Backend Development
CacheGen: Store Your KV Cache on Disk or S3—Load Blazingly Fast!

July 31, 2025

Tutorial

cachegen, kv cache, quantization, s3, storage

TL;DR: 🚀 CacheGen lets you store KV caches on disk or AWS S3 and load them way faster than recomputing! It compresses your KV cache up to 3× smaller than quantization so that you can load your KV cache blazingly fast while keeping response quality high. Stop wasting compute — use CacheGen to fully utilize…

Read more: CacheGen: Store Your KV Cache on Disk or S3—Load Blazingly Fast!
LMCache x Mooncake: Unite to Pioneer KVCache-Centric LLM Serving System

May 8, 2025

News

collaboration, lmcache, mooncake, storage

Overview of the Collaboration LMCache and Mooncake have announced a strategic collaboration aimed at pioneering a KVCache-centric Large Language Model (LLM) serving system. This partnership seeks to significantly enhance the efficiency, scalability, and responsiveness of LLM applications. By combining LMCache’s advanced KVCache management techniques with Mooncake’s powerful and optimized backend infrastructure, the collaboration aims to…

Read more: LMCache x Mooncake: Unite to Pioneer KVCache-Centric LLM Serving System

About us

Categories

Tags

Tag: storage

Breaking the Memory Barrier: How LMCache and CoreWeave Power Efficient LLM Inference for Cohere

LMCache on Google Kubernetes Engine: Boosting LLM Inference Performance with KV Cache on Tiered Storage

Extending LMCache Backends: A Comprehensive Guide to Custom Backend Development

CacheGen: Store Your KV Cache on Disk or S3—Load Blazingly Fast!

LMCache x Mooncake: Unite to Pioneer KVCache-Centric LLM Serving System