Tech Explained Archives

vLLM + LMCache: A Starter Guide, No GPU Required

June 23, 2026

lmcache, Tech Explained, Tutorial

Get started easily: a single MacBook is all you need to develop vLLM + LMCacheFor New Contributors · Covering Frontend / L1 Eviction / L2 Storage / Observability If you ever skipped LMCache because you didn’t have a GPU on hand, this guide was written for you. LMCache’s multi-platform framework has already decoupled the GPU…

Read more: vLLM + LMCache: A Starter Guide, No GPU Required
OpenAI API Is the New IPv4

May 20, 2026

lmcache, Tech Explained

A new system stack is quietly taking shape around LLM serving. What makes it interesting is not just how quickly it is evolving, but how familiar the shape of that evolution looks if you’ve spent time studying large-scale systems like the internet or web infrastructure. These systems were never cleanly designed from first principles; they…

Read more: OpenAI API Is the New IPv4
Deepseek V4 explained, and why it matters to your wallet

May 4, 2026

Tech Explained

DeepSeek V4 — an open weight model that gives you the state-of-the-art intelligence, while potentially gives you much cheaper token price than its preceding model, DeepSeek V3.2. But how does DeepSeek v4 does that? Pre-requisite: attention, KV caches, and why KV cache is the key that affects token pricing To know why DeepSeek V4 can…

Read more: Deepseek V4 explained, and why it matters to your wallet
What is TurboQuant and why it matters for LLM inference, in laymen’s term

April 15, 2026

Tech Explained

TL;DR: TurboQuant allows you to put 4x more context in your GPU without blowing up GPU memory or dropping AI’s intelligence. It does so by quantizing the memory of large language models, also known as KV cache, an important bottleneck mentioned by Jensen Huang multiple times at this year’s GTC. It relies on two secret…

Read more: What is TurboQuant and why it matters for LLM inference, in laymen’s term

About us

Tags

Category: Tech Explained

vLLM + LMCache: A Starter Guide, No GPU Required

OpenAI API Is the New IPv4

Deepseek V4 explained, and why it matters to your wallet

What is TurboQuant and why it matters for LLM inference, in laymen’s term