ITRT(IT Research Trends)

Advancing Natural Language to SQL: A Comparative Study of Open Source LLMs on Benchmark Datasets

연구 분야: Artificial Intelligence

논문 키워드: #databases #sql #database #wikisql #spider

학회: 2025 IEEE Symposium on Computational Intelligence in Natural Language Processing and Social Media (CI-NLPSoMe Companion)

초록

Translating Natural Language to SQL (NL-to-SQL) allows users to communicate with databases using a common language instead of complicated query syntax. This is important as it frees the non-technical user from querying the database, making interactions more easier. This work primarily evaluates open-source language models on three key NL-to-SQL bench-mark datasets: SPIDER, a cross-domain dataset; WikiSQL, a simpler single-domain dataset; and BIRD, a dataset comprising of real-world data. These datasets were selected because of the variety of problems they present: WikiSQL concentrates on effective querying using straightforward single-tables, while BIRD simulates noisy, real-world data scenarios where precise and effective SQL creation is essential. SPIDER analyzes how well models generalize over complex schemas and multi-table queries. We fine-tuned eight models from the LLM families of LLama, Gemma, and Mistral across these datasets with consistent hyper-parameters to tackle these challenges. We used an SQL evaluation framework that examines model generalization, considering both linguistic variety and complex query patterns against a baseline of accuracy and correctness of existing NL-to-SQL models available. This study demonstrates how different models perform across datasets, with some performing better in simpler queries and others being suited for managing inconsistent, real-world data or difficult cross-domain tasks.

📄 논문 정보

발행 연도	2025년
인용수	198
출판 국가	India
사이트	IEEE
좋아요 수	0

Advancing Natural Language to SQL: A Comparative Study of Open Source LLMs on Benchmark Datasets

Advancing Natural Language to SQL: A Comparative Study of Open Source LLMs on Benchmark Datasets

Ashwin K Sharma

Sri Charan Kanumuri

Pranav A R

Shylaja S Sharath

📄 논문 정보

연관 논문 목록 (4건)

Advancing Natural Language to SQL: A Comparative Study of Open Source LLMs on Benchmark Datasets

Advancing Natural Language to SQL: A Comparative Study of Open Source LLMs on Benchmark Datasets

📄 논문 정보

연관 논문 목록 (4건) 내 서재 담기

연관 논문 목록 (4건)