LMCache Blog

About us

Categories

Tags

Follow us on: X, LinkedIn

Initiated and Officially Supported by Tensormesh

Tag: gke

LMCache on Google Kubernetes Engine: Boosting LLM Inference Performance with KV Cache on Tiered Storage

October 7, 2025

Benchmark

benchmark, gke, Google, storage, vLLM

Overview of the Collaboration The KV Cache is a memory optimization that makes Large Language Models(LLMs) run the forward pass faster by storing Key (K) and Value (V) matrices to prevent the model from recalculating them for the entire text sequence with every new generated token. Maximizing the KV Cache hit rate with storage is…

Read more: LMCache on Google Kubernetes Engine: Boosting LLM Inference Performance with KV Cache on Tiered Storage
Deploying LLMs in Clusters #2: running “vLLM production-stack” on AWS EKS and GCP GKE

February 20, 2025

Tutorial

aws, eks, gcp, gke, lambda, lambda lab, production stack, vLLM

TL;DR [Github Link] | [More Tutorials] | [Get In Touch] AWS Tutorial (click here) GKE Tutorial (click here) The Context vLLM has taken the open-source community by storm, with unparalleled hardware and model support plus an active ecosystem of top-notch contributors. But until now, vLLM has mostly focused on single-node deployments. vLLM Production-stack is an…

Read more: Deploying LLMs in Clusters #2: running “vLLM production-stack” on AWS EKS and GCP GKE