k8s Archives | LMCache Blog

Open-Source LLM Inference Cluster Performing 10x FASTER than SOTA OSS Solution

March 6, 2025

Benchmark

k8s, kubernetes, production stack, QPS, router, TTFT, vLLM

A picture is worth a thousand words: Executive Summary: [vLLM Production Stack Github] | [Get In Touch] | [Slack] | [Linkedin] | [Twitter] Benchmark setups Methods: Workload: Inspired by our production deployments, we create workloads that emulate a typical chat-bot document analysis workload. By default, each LLM query input has 9K tokens, including a document…

Read more: Open-Source LLM Inference Cluster Performing 10x FASTER than SOTA OSS Solution
AGI Infra for All: vLLM Production Stack as the Standard for Scalable vLLM Serving

March 2, 2025

New features

k8s, kubernetes, production stack, vLLM

TL;DR Why vLLM Production Stack? AGI isn’t just about better models–it is also about better systems to serve the models to the wide public so that everyone will have access to the new capabilities! In order to fully harness the power of Generative AI, every organization that take this AI revolution seriously needs to have…

Read more: AGI Infra for All: vLLM Production Stack as the Standard for Scalable vLLM Serving
Deploying LLMs in Clusters #1: running “vLLM production-stack” on a cloud VM

February 13, 2025

Tutorial

deployment, k8s, kubernetes, production stack, vLLM

TL;DR [Github Link] | [More Tutorials] | [Interest Form] Tutorial Video (click below) The Context vLLM has taken the open-source community by storm, with unparalleled hardware and model support plus an active ecosystem of top-notch contributors. But until now, vLLM has mostly focused on single-node deployments. vLLM Production-stack is an open-source reference implementation of an…

Read more: Deploying LLMs in Clusters #1: running “vLLM production-stack” on a cloud VM
High Performance and Easy Deployment of vLLM in K8S with “vLLM production-stack”

January 21, 2025

News

deployment, k8s, kubernetes, performance, production stack, vLLM

TL;DR The Context In the AI arms race, it’s no longer just about who has the best model—it’s about who has the best LLM serving system. vLLM has taken the open-source community by storm, with unparalleled hardware and model support plus an active ecosystem of top-notch contributors. But until now, vLLM has mostly focused on…

Read more: High Performance and Easy Deployment of vLLM in K8S with “vLLM production-stack”

About us

Categories

Tags

Tag: k8s

Open-Source LLM Inference Cluster Performing 10x FASTER than SOTA OSS Solution

AGI Infra for All: vLLM Production Stack as the Standard for Scalable vLLM Serving

Deploying LLMs in Clusters #1: running “vLLM production-stack” on a cloud VM

High Performance and Easy Deployment of vLLM in K8S with “vLLM production-stack”