ITRT(IT Research Trends)

SeSQL: A High-Quality Large-Scale Session-Level Chinese Text-to-SQL Dataset

연구 분야: Databases

논문 키워드: #chinese #iterative #sql #028 #456

학회: CCF International Conference on Natural Language Processing and Chinese Computing

초록

As the first session-level Chinese dataset, CHASE contains two separate parts, i.e., 2,003 sessions manually constructed from scratch (CHASE-C), and 3,456 sessions translated from English SParC (CHASE-T). We find the two parts are highly discrepant and incompatible. In this work, we present SeSQL, a high-quality large-scale session-level Chinese text-to-SQL dataset, consisting of 5,028 sessions all manually constructed from scratch. Compared with previous datasets, in order to guarantee data quality, we adopt an iterative annotation workflow to facilitate intense and in-time review of previous-round natural language (NL) questions and SQL queries. Moreover, by completing all context-dependent NL questions, we obtain 27,012 context-independent question/SQL pairs, allowing SeSQL to be used as the largest dataset for single-round text-to-SQL parsing. We conduct benchmark session-level text-to-SQL parsing experiments on SeSQL via employing three competitive session-level parsers, and present detailed analysis.

📄 논문 정보

발행 연도	2023년
인용수	0
출판 국가	Andorra, China
사이트	Springer
좋아요 수	0

SeSQL: A High-Quality Large-Scale Session-Level Chinese Text-to-SQL Dataset

SeSQL: A High-Quality Large-Scale Session-Level Chinese Text-to-SQL Dataset

Saihao Huang

Lijie Wang

Zhenghua Li

Zeyang Liu

Chenhui Dou

Fukang Yan

Xinyan Xiao

Hua Wu

Min Zhang

📄 논문 정보

연관 논문 목록 (199건)

SeSQL: A High-Quality Large-Scale Session-Level Chinese Text-to-SQL Dataset

SeSQL: A High-Quality Large-Scale Session-Level Chinese Text-to-SQL Dataset

📄 논문 정보

연관 논문 목록 (199건) 내 서재 담기

연관 논문 목록 (199건)