Extending LMCache Backends: A Comprehensive Guide to Custom Backend Development

Abstract

In large language model inference scenarios, the performance and flexibility of KVCache caching systems directly impact overall service efficiency. LMCache, as a high-performance large model caching framework, provides developers with rich extension capabilities through its modular backend design. This article will start with LMCache backend’s extension mechanism, using the officially provided lmc_external_log_backend as an example, to detail how to customize backends that meet business requirements, including design principles, implementation details, and practical steps. We hope that various vendors can easily extend their own backends based on this guide.

LMCache’s Existing Backends

LMCache Backend Architecture Overview

As shown in the diagram above, LMCache currently provides numerous backends that serve as storage and transmission adapters for KVCache.

Among them, LocalCpuBackend is the most commonly used backend and is essential in most cases, as it not only serves as a hot cache but also takes responsibility for allocating the necessary MemoryObjects for storage and retrieval processes.

Another special one is NixlBackend, whose purpose is to transmit KVCache to peer nodes rather than sending it to local or remote storage.

In addition, RemoteBackend is currently the most easily extensible backend, adopting the Connector framework, which allows storage providers to integrate with LMCache more conveniently.

Finally, the highlighted ExternalBackend is the dynamically extensible external backend that this article will introduce in detail.

Why Extend LMCache Backends?

LMCache provides common backends based on memory, local disk, GDS, etc., by default. However, in actual production environments, developers may face the following customization needs:

Special storage medium adaptation: Need to integrate with distributed storage (such as Ceph), non-volatile memory (NVM), and other special media
Customized persistence strategies: Need to implement specific persistence logic such as write-ahead logging (WAL) or snapshot-based approaches to ensure data reliability
Business-level monitoring and auditing: Need to embed business logic such as log reporting and performance statistics into cache operations
Compatibility and compliance: Need to adapt to internal storage protocols or meet requirements for data encryption, access control, and other compliance needs

lmc_external_log_backend can be viewed as an extension example designed for “logging” scenarios. It provides an excellent reference for understanding the extension mechanism by implementing LMCache’s abstract interface and recording cache operations in log format.

Core Mechanism of LMCache Backend Extension

LMCache adopts an “abstract interface + concrete implementation” design pattern, where all backends must follow unified interface specifications to ensure decoupling from the framework’s core logic. Before starting development, you need to master the following core concepts:

Core Abstract Interface Definition

LMCache defines the general interface in StorageBackendInterface in lmcache/v1/storage_backend/abstract_backend.py.

StorageBackendInterface Class Structure

The core methods that extended backends need to implement include:

Interface Method	Function Description
`__init__(self, config, metadata, loop, memory_allocator, local_cpu_backend: LocalCPUBackend, dst_device, lookup_server=None)`	Initialize the backend (such as opening files, connecting to storage). The constructor should refer to the latest definition in LMCache’s `lmcache/v1/storage_backend/__init__.py` in the `create_dynamic_backends` method to maintain consistency
`contains(self, key: CacheEngineKey, pin: bool = False) -> bool`	Check if the specified key exists
`exists_in_put_tasks(self, key: CacheEngineKey) -> bool`	Check if the key is in asynchronous put tasks
`batched_submit_put_task(self, keys: Sequence[CacheEngineKey], objs: List[MemoryObj], transfer_spec=None)`	Submit a batch of put tasks at once
`submit_prefetch_task(self, key: CacheEngineKey) -> bool`	Submit prefetch tasks
`get_blocking(self, key: CacheEngineKey) -> Optional[MemoryObj]`	Get key content in blocking mode
`get_non_blocking(self, key: CacheEngineKey) -> Optional[Future]`	Get key content asynchronously, returning a Future object for waiting for data content
`batched_get_blocking(self, keys: List[CacheEngineKey]) -> List[Optional[MemoryObj]]`	Get a batch of key contents in blocking mode at once
`pin(self, key: CacheEngineKey) -> bool`	Pin the MemoryObject corresponding to this key to prevent eviction
`unpin(self, key: CacheEngineKey) -> bool`	Unpin this key so it can be evicted
`remove(self, key: CacheEngineKey, force: bool = True) -> bool`	Delete this key
`batched_remove(self, keys: list[CacheEngineKey], force: bool = True) -> int`	Delete a batch of keys at once, returning the actual number of deleted objects

Any custom backend only needs to inherit the StorageBackendInterface abstract class and implement the above interfaces to seamlessly integrate into LMCache.

Process for Using Extended Backends

LMCache loads custom backends through the following process, achieving decoupling through “develop extension - install extension - configure extension”:

Develop Extension: Inherit StorageBackendInterface, reference lmc_external_log_backend, implement interface functions, then package into a wheel package
Install Extension: Install the extension locally via pip install <backend whl>
Configure Extension: Users specify external_backends: "<new extension backend>" in the LMCache configuration file, such as specifying log_external_backend as the self-extended backend. You can also provide comma-separated values to specify multiple external backends

Additionally, you need to configure extra_config to specify additional configuration for custom external extensions.

Here’s a configuration example using log_external_backend:

chunk_size: 64
local_cpu: False
max_local_cpu_size: 5
external_backends: "log_external_backend"
extra_config:
  external_backend.log_external_backend.module_path: lmc_external_log_backend.lmc_external_log_backend
  external_backend.log_external_backend.class_name: ExternalLogBackend

Custom External Backend Example: lmc_external_log_backend Introduction

lmc_external_log_backend is an official backend example provided by LMCache. Its function is the simplest implementation of necessary interfaces that prints logs. Its core goal is to demonstrate the backend extension approach.

ExternalLogBackend Class Implementation Structure

As shown in the diagram above, ExternalLogBackend inherits from LMCache’s StorageBackendInterface and implements the agreed-upon __init__ method and other abstract methods.

1. Backend Initialization

class ExternalLogBackend(StorageBackendInterface):
    def __init__(
        self,
        config,
        metadata,
        loop,
        memory_allocator,
        local_cpu_backend: LocalCPUBackend,
        dst_device,
        lookup_server=None,
    ):
        super().__init__(dst_device=dst_device)
        self.config = config
        self.metadata = metadata
        self.loop = loop
        self.memory_allocator = memory_allocator
        self.lookup_server = lookup_server
        logger.info(f"ExternalLogBackend initialized for device {dst_device}")

2. Implementing Core Methods

Here are a few example methods; for complete implementation, please refer to the code repository:

def contains(self, key: CacheEngineKey, pin: bool = False) -> bool:
    logger.info(f"ExternalLogBackend Checking contains for key: {key}")
    return False

def batched_submit_put_task(
    self,
    keys: List[CacheEngineKey],
    objs: List[MemoryObj],
    transfer_spec=None,
) -> Optional[List[Future]]:
    for key, obj in zip(keys, objs):
        # Record storage operation for each key
        logger.info(f"ExternalLogBackend Put task for key: {key}, size: {obj.tensor.size()}")
    return None

def submit_prefetch_task(self, key: CacheEngineKey) -> bool:
    logger.info(f"ExternalLogBackend Prefetch task for key: {key}")
    return True

3. Complete Implementation Reference

This log backend doesn’t actually store data but records all operations to the logging system. For complete code, please refer to the GitHub repository lmc_external_log_backend. The code repository explains how to package, create wheel packages, and install them, which this article won’t elaborate on.

Summary

LMCache’s external backend extension mechanism provides developers with powerful customization capabilities:

Flexible Integration: Supports various storage systems
Non-intrusive: No need to modify existing LMCache repository code
Dynamic Loading: New backends can be enabled through dependency installation and configuration only
Easy Development: Clear interfaces and documentation support

Whether you need to add monitoring, support new hardware, or implement custom storage strategies, extending LMCache backends can efficiently achieve these goals. The lmc_external_log_backend project is an excellent starting point that can help you quickly understand the backend development process.

Tags: backend extension customization storage lmcache