cacheblend Archives | LMCache Blog

Accelerating OpenClaw Agents with CacheBlend

April 1, 2026

Agent, lmcache, Performance

cacheblend, Demo, KVCache, lmcache, OpenClaw, RAG

The standard approach to reducing LLM inference costs is prefix caching, which reuses previously computed token states to avoid redundant computation. In practice, however, this approach misses significant caching opportunities in real-world agentic workloads! Caching in Agentic Workflows In agentic workloads, shared content (e.g., retrieved contexts and documents) frequently appears across requests at varied positions,…

Read more: Accelerating OpenClaw Agents with CacheBlend
Context Engineering & Reuse Pattern Under the Hood of Claude Code

December 23, 2025

Benchmark, lmcache

cacheblend, claude-code, lmcache

Over the last few months, Claude Code has quietly become one of the most interesting & widely-adopted real-world agentic systems available to normal developers. Unlike cloud-only agents whose internals remain hidden behind API gateways like Perplexity, Devin, or Manus, nor as fully open source agents like Mini SWE Agent or Terminus 2 where you can…

Read more: Context Engineering & Reuse Pattern Under the Hood of Claude Code
CacheBlend (Best Paper @ ACM EuroSys’25): Enabling 100% KV Cache Hit Rate in RAG

March 31, 2025

News

award, cacheblend, paper, RAG

Break News: “CacheBlend” Receives BEST PAPER AWARD at ACM EuroSys 2025 This week, at ACM EuroSys 2025 (Top Academic Conference in Computer Systems), Jiayi Yao, the first author of the groundbreaking paper on CacheBlend, will present our innovative work that redefines the landscape of LLM efficiency, particularly in retrieval-augmented generation (RAG) applications. This paper has…

Read more: CacheBlend (Best Paper @ ACM EuroSys’25): Enabling 100% KV Cache Hit Rate in RAG
Beyond Prefix Caching! How LMCache Speeds Up RAG by 4.5x By One Line of Change

October 9, 2024

Tutorial

cacheblend, paper, RAG

TL;DR: Your RAG can run up to 4.5× faster by pairing vLLM with LMCache . [💻 Source code] [📚 Paper] will appear in the 10th ACM EuroSys (European Conference on Computer Systems) 2025 [🎬 3-minute introduction video] The Problem: RAG is WAY TOO SLOW Retrieval-Augmented Generation (RAG) has become a key technique in…

Read more: Beyond Prefix Caching! How LMCache Speeds Up RAG by 4.5x By One Line of Change

About us

Tags

Tag: cacheblend

Accelerating OpenClaw Agents with CacheBlend

Context Engineering & Reuse Pattern Under the Hood of Claude Code

CacheBlend (Best Paper @ ACM EuroSys’25): Enabling 100% KV Cache Hit Rate in RAG

Beyond Prefix Caching! How LMCache Speeds Up RAG by 4.5x By One Line of Change