waLLMartCache: A Distributed, Multi-tenant and Enhanced Semantic Caching System for LLMs


연구 분야: Databases



학회: International Conference on Pattern Recognition


초록

In recent times, Large Language Models (LLMs) have become an important tool for various business use cases. The power of these LLMs can be leveraged in improving developer productivity, as a knowledge base, for creating applications such as question answering and information retrieval systems. Unfortunately, these benefits come at the price of high usage costs and latency. The managed LLM services charge based on tokens (words) processed which become very significant with scale. Even self-hosted open source LLMs turn out to be expensive because hosting requires expensive GPUs and scaling to many requests would need significant horizontal scaling of these resources. In this context, an LLM focused caching system can significantly reduce usage costs as well as latency. This problem is addressed by GPTCache. The current work termed waLLMartCache advances GPTCache by incorporating the following features: (i) we introduce the support for a new database Redis in GPTCache (our pull request is already merged with GPTCache main branch) – this is used as L2 cache in our designed system, (ii) presently, GPTCache is implemented to be run on a single node which we enhance to span across multiple nodes to handle industry-scale requests and consequently, we also designed a distributed eviction manager, (iii) we further create partitions for individual tenants (clients) so that these can be hosted together while maintaining semantic separations, (iv) we present a decision engine that decides whether to cache an LLM response based on our business use-cases, and (v) we showcase that loading FAQs (which can be set to be stored persistently in the memory) while booting the LLM cache can be a simple yet effective strategy to boost cache hits significantly. Although this system is in-house to our company, we believe that the methodology shared in this paper is generic enough to be adopted by any organization.


Author Profile
Soumik Dasgupta

Walmart Global Tech Bangalore India

India
Author Profile
Anurag Wagh

Walmart Global Tech Bangalore India

India
Author Profile
Lalitdutt Parsai

Walmart Global Tech Bangalore India

India

📄 논문 정보

발행 연도 2024년
인용수 0
출판 국가 India
사이트 Springer
좋아요 수 0

연관 논문 목록 (60건)