연구 분야: Databases
학회: International Conference on Pattern Recognition
In recent times, Large Language Models (LLMs) have become an important tool for various business use cases. The power of these LLMs can be leveraged in improving developer productivity, as a knowledge base, for creating applications such as question answering and information retrieval systems. Unfortunately, these benefits come at the price of high usage costs and latency. The managed LLM services charge based on tokens (words) processed which become very significant with scale. Even self-hosted open source LLMs turn out to be expensive because hosting requires expensive GPUs and scaling to many requests would need significant horizontal scaling of these resources. In this context, an LLM focused caching system can significantly reduce usage costs as well as latency. This problem is addressed by GPTCache. The current work termed waLLMartCache advances GPTCache by incorporating the following features: (i) we introduce the support for a new database Redis in GPTCache (our pull request is already merged with GPTCache main branch) – this is used as L2 cache in our designed system, (ii) presently, GPTCache is implemented to be run on a single node which we enhance to span across multiple nodes to handle industry-scale requests and consequently, we also designed a distributed eviction manager, (iii) we further create partitions for individual tenants (clients) so that these can be hosted together while maintaining semantic separations, (iv) we present a decision engine that decides whether to cache an LLM response based on our business use-cases, and (v) we showcase that loading FAQs (which can be set to be stored persistently in the memory) while booting the LLM cache can be a simple yet effective strategy to boost cache hits significantly. Although this system is in-house to our company, we believe that the methodology shared in this paper is generic enough to be adopted by any organization.
| 발행 연도 | 2024년 |
|---|---|
| 인용수 | 0 |
| 출판 국가 | India |
| 사이트 | Springer |
| 좋아요 수 | 0 |