Improving the Accuracy of Text-to-SQL Tools Based on Large Language Models for Real-World Relational Databases


연구 분야: Databases



학회: International Conference on Database and Expert Systems Applications


초록

Real-world relational databases (RW-RDB) have large, complex schemas often expressed in terms alien to end-users. This scenario is challenging to LLM-based text-to-SQL tools, that is, tools that translate Natural Language (NL) sentences into SQL queries using a Large Language Model (LLM). Indeed, their accuracy on RW-RDBs is considerably less than that reported for well-known synthetic benchmarks. This paper then introduces a technique to improve the accuracy of LLM-based text-to-SQL tools on RW-RDBs using Retrieval-Augmented Generation. The technique consists of two steps. Using the RW-RDB schema, the first step generates a synthetic dataset E of pairs , where is an NL sentence and is the corresponding SQL translation. The core contribution of the paper is an algorithm that implements this first step. Given an input NL sentence , the second step retrieves pairs from E based on the similarity of and , and prompts such pairs to the LLM to improve accuracy. To argue in favor of the proposed technique, the paper includes experiments with an RW-RDB, which is in production at an Energy company, and a well-known text-to-SQL prompt strategy. It repeats the experiments with Mondial, an openly available database with a large schema. These experiments constitute a second contribution of the paper.


Author Profile
Gustavo M. C. Coelho

Instituto Tecgraf PUC-Rio Rio de Janeiro RJ 22451-900 Brazil

Brazil
Author Profile
Eduardo R. S. Nascimento

Instituto Tecgraf PUC-Rio Rio de Janeiro RJ 22451-900 Brazil

Brazil
Author Profile
Yenier T. Izquierdo

Instituto Tecgraf PUC-Rio Rio de Janeiro RJ 22451-900 Brazil

Brazil

📄 논문 정보

발행 연도 2024년
인용수 0
출판 국가 Brazil
사이트 Springer
좋아요 수 0

연관 논문 목록 (416건)