LMCache Blog

About us

Categories

Tags

Follow us on: X, LinkedIn

Initiated and Officially Supported by Tensormesh

Category: AMD

Benchmarking LMCache for Multi-Turn Agentic Workloads on AMD MI300X

May 12, 2026

AMD, Benchmark, lmcache, Performance

A practitioner’s guide to KV-cache tiering on ROCm — what works, what doesn’t, and the regime where it actually matters. Key Summary We benchmarked multi-turn agentic workloads using 739 anonymized Claude Code conversation traces from kv-cache-tester against MiniMax-M2.5 (230 GB FP8 MoE) on 2× AMD MI300X with vLLM 0.19.0 + LMCache (built from source for…

Read more: Benchmarking LMCache for Multi-Turn Agentic Workloads on AMD MI300X
AMD × LMcache: AMD GPU Acceleration with LMcache

January 9, 2026

AMD, Benchmark, lmcache

Introduction LLM inference becomes increasingly challenging as context length grows and workloads scale. Traditional serving engines rely on prefix-based KV cache reuse, which limits opportunities for optimization, especially when processing long, repeated, or overlapping text across different requests. LMCache addresses this challenge. It is an extension to LLM serving engines that dramatically reduces time-to-first-token (TTFT)…

Read more: AMD × LMcache: AMD GPU Acceleration with LMcache