ReFSQL: A Retrieval-Augmentation Framework for Text-to-SQL Generation

被引:0
|
作者
Zhang, Kun [1 ,2 ]
Lin, Xiexiong [3 ]
Wang, Yuanzhuo [1 ,2 ,4 ]
Zhang, Xin [3 ]
Sun, Fei [1 ,2 ]
Cen, Jianhe [4 ]
Jiang, Xuhui [1 ,2 ]
Tan, Hexiang [1 ,2 ]
Shen, Huawei [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Data Intelligence Syst Res Ctr, Beijing 100864, Peoples R China
[2] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing, Peoples R China
[3] Ant Grp, Hangzhou, Peoples R China
[4] Big Data Acad, Barcelona, Spain
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-SQL is the task that aims at translating natural language questions into SQL queries. Existing methods directly align the natural language with SQL Language and train one encoder-decoder-based model to fit all questions. However, they underestimate the inherent structural characteristics of SQL, as well as the gap between specific structure knowledge and general knowledge. This leads to structure errors in the generated SQL. To address the above challenges, we propose a retrieval-argument framework, namely ReFSQL. It contains two parts, structure-enhanced retriever and the generator. Structure-enhanced retriever is designed to identify samples with comparable specific knowledge in an unsupervised way. Subsequently, we incorporate the retrieved samples' SQL into the input, enabling the model to acquire prior knowledge of similar SQL grammar. To further bridge the gap between specific and general knowledge, we present a mahalanobis contrastive learning method, which facilitates the transfer of the sample toward the specific knowledge distribution constructed by the retrieved samples. Experimental results on five datasets verify the effectiveness of our approach in improving the accuracy and robustness of Text-to-SQL generation. Our framework has achieved improved performance when combined with many other backbone models (including the 11B flan-T5) and also achieved state-of-the-art performance when compared to existing methods that employ the fine-tuning approach.
引用
收藏
页码:664 / 673
页数:10
相关论文
共 50 条
  • [1] Data Augmentation with Hierarchical SQL-to-Question Generation for Cross-domain Text-to-SQL Parsing
    Wu, Kun
    Wang, Lijie
    Li, Zhenghua
    Zhang, Ao
    Xiao, Xinyan
    Wu, Hua
    Zhang, Min
    Wang, Haifeng
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 8974 - 8983
  • [2] MIGA: A Unified Multi-Task Generation Framework for Conversational Text-to-SQL
    Fu, Yingwen
    Ou, Wenjie
    Yu, Zhou
    Lin, Yue
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 12790 - 12798
  • [3] Data-Anonymous Encoding for Text-to-SQL Generation
    Dong, Zhen
    Sun, Shizhao
    Liu, Hongzhi
    Lou, Jian-Guang
    Zhang, Dongmei
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5405 - 5414
  • [4] Benchmarking and Improving Text-to-SQL Generation under Ambiguity
    Bhaskar, Adithya
    Tomar, Tushar
    Sathe, Ashutosh
    Sarawagi, Sunita
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 7053 - 7074
  • [5] Multitask Pretraining with Structured Knowledge for Text-to-SQL Generation
    Giaquinto, Robert
    Zhang, Dejiao
    Kleiner, Benjamin
    Li, Yang
    Tan, Ming
    Bhatia, Parminder
    Nallapati, Ramesh
    Ma, Xiaofei
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 11067 - 11083
  • [6] A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL
    Cao, Ruisheng
    Chen, Lu
    Li, Jieyu
    Zhang, Hanchong
    Xu, Hongshen
    Zhang, Wangyou
    Yu, Kai
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (11) : 13796 - 13813
  • [7] RECPARSER: A Recursive Semantic Parsing Framework for Text-to-SQL Task
    Zeng, Yu
    Gao, Yan
    Guo, Jiaqi
    Chen, Bei
    Liu, Qian
    Lou, Jian-Guang
    Teng, Fei
    Zhang, Dongmei
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3644 - 3650
  • [8] Bridging the gap between text-to-SQL research and real-world applications: A unified all-in-one framework for text-to-SQL
    Han, Mirae
    Park, Seongsik
    Kim, Harksoo
    Kim, Seulgi
    KNOWLEDGE-BASED SYSTEMS, 2024, 306
  • [9] On the Vulnerabilities of Text-to-SQL Models
    Peng, Xutan
    Zhang, Yipeng
    Yang, Jingfeng
    Stevenson, Mark
    2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 1 - 12
  • [10] Valid Text-to-SQL Generation with Unification-Based DeepStochLog
    Jiao, Ying
    De Raedt, Luc
    Marra, Giuseppe
    NEURAL-SYMBOLIC LEARNING AND REASONING, PT I, NESY 2024, 2024, 14979 : 312 - 330