About us

Categories

Tags

Follow us on: X, LinkedIn

Are you a vLLM user? Change just ONE line of code to unlock 100x more KV cache storage power!

By

LMCache Team

Graphic showing code transformation from 'import vllm' to 'from lmcache_vllm import vllm' emphasizing benefits of 100x KV-cache storage and faster LLM inference.

Are you a vLLM user? Unlock 100x more KV cache storage space for your multi-round conversation and document QA applications using LMCache! Just ONE line change to your code!

Offline inference

For offline inference, you can use LMCache within two steps:

First run

pip install lmcache lmcache_vllm

And then change

import vllm

to

from lmcache_vllm import vllm

and now you are good to go!

Like in the following example

"""
simply change
    import vllm
to
"""
from lmcache_vllm import vllm
"""
and you are good to go!
"""

prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
sampling_params = vllm.SamplingParams(temperature=0.8, top_p=0.95)

llm = vllm.LLM(model="facebook/opt-125m")

outputs = llm.generate(prompts, sampling_params)

# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

Online serving

If you prefer using vLLM through its OpenAI API server, you can also use LMCache within 2 steps:

First run

pip install lmcache lmcache_vllm

and then replace vllm serve to lmcache_vllm serve. For example, you can change

vllm serve lmsys/longchat-7b-16k --gpu-memory-utilization 0.8

to

lmcache_vllm serve lmsys/longchat-7b-16k --gpu-memory-utilization 0.8

and now your lmcache-augmented vLLM server is up and ready for use!

Contact Us

Interested? Check our github repo in github.com/LMCache/LMCache!

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from LMCache Blog

Subscribe now to keep reading and get access to the full archive.

Continue reading