award aws benchmark cacheblend cachegen CAIOS claude-code collaboration deployment distributed-inference dynamo gke k8s kernel kubernetes kv cache lambda lambda lab LLM llm-d lmcache LMIginte Modula mooncake NIXL nvidia one click OpenClaw orchestration orchestrator paper PD disagregation performance prefill production stack pytorch quantization RAG scale storage tencent tensormesh TPU TTFT vLLM