Arm Ascend benchmark cacheblend cachegen CAIOS cohere collaboration connector coreweave CUDA decoding deployment dynamo eks gcp gke gpt-oss k8s kernel kubernetes kv cache LLM llm-d lmcache lmignite Modula mooncake NIXL nvidia OpenAI paper PD disagregation performance prefill production stack pytorch RAG spec decode speculative storage tencent tensormesh TTFT vLLM
