LLM-Based Text-to-SQL for Real-World Databases


연구 분야: Databases



학회: SN Computer Science


초록

Text-to-SQL refers to the task defined as “given a relational database D and a natural language sentence S that describes a question on D, generate an SQL query Q over D that expresses S”. Several LLM-based text-to-SQL tools, that is, text-to-SQL tools that explore Large Language Models (LLMs), emerged that outperformed previous approaches on well-known benchmarks. This article first shows that the performance of a selected set of LLM-based text-to-SQL tools is, however, significantly less when run on two challenging databases with a large number of tables, columns, and foreign keys. A closer analysis reveals that one of the problems lie in that the relational schema is an inappropriate specification of the database from the point of view of the LLM. The article then introduces database specifications based on LLM-friendly views, that are close to the language of the users’ questions and that eliminate frequently used joins, and LLM-friendly data descriptions of the database values. The article proceeds to show that the use of a set of LLM-friendly views and data samples considerably improves the performance of a text-to-SQL prompt strategy over a real-world database. This result suggests that real-world databases require rethinking how schema specifications should be passed to the LLM to recover state-of-the-art performance.


Author Profile
Eduardo R. Nascimento

Instituto Tecgraf PUC-Rio Rio de Janeiro 22451-900 RJ Brazil

Brazil
Author Profile
Grettel García

Instituto Tecgraf PUC-Rio Rio de Janeiro 22451-900 RJ Brazil

Brazil
Author Profile
Yenier T. Izquierdo

Instituto Tecgraf PUC-Rio Rio de Janeiro 22451-900 RJ Brazil

Brazil

📄 논문 정보

발행 연도 2025년
인용수 2
출판 국가 Brazil
사이트 Springer
좋아요 수 0

연관 논문 목록 (431건)