ReFSQL: A Retrieval-Augmentation Framework for Text-to-SQL Generation

被引:0
|
作者
Zhang, Kun [1 ,2 ]
Lin, Xiexiong [3 ]
Wang, Yuanzhuo [1 ,2 ,4 ]
Zhang, Xin [3 ]
Sun, Fei [1 ,2 ]
Cen, Jianhe [4 ]
Jiang, Xuhui [1 ,2 ]
Tan, Hexiang [1 ,2 ]
Shen, Huawei [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Data Intelligence Syst Res Ctr, Beijing 100864, Peoples R China
[2] Univ Chinese Acad Sci, Sch Comp Sci & Technol, Beijing, Peoples R China
[3] Ant Grp, Hangzhou, Peoples R China
[4] Big Data Acad, Barcelona, Spain
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-SQL is the task that aims at translating natural language questions into SQL queries. Existing methods directly align the natural language with SQL Language and train one encoder-decoder-based model to fit all questions. However, they underestimate the inherent structural characteristics of SQL, as well as the gap between specific structure knowledge and general knowledge. This leads to structure errors in the generated SQL. To address the above challenges, we propose a retrieval-argument framework, namely ReFSQL. It contains two parts, structure-enhanced retriever and the generator. Structure-enhanced retriever is designed to identify samples with comparable specific knowledge in an unsupervised way. Subsequently, we incorporate the retrieved samples' SQL into the input, enabling the model to acquire prior knowledge of similar SQL grammar. To further bridge the gap between specific and general knowledge, we present a mahalanobis contrastive learning method, which facilitates the transfer of the sample toward the specific knowledge distribution constructed by the retrieved samples. Experimental results on five datasets verify the effectiveness of our approach in improving the accuracy and robustness of Text-to-SQL generation. Our framework has achieved improved performance when combined with many other backbone models (including the 11B flan-T5) and also achieved state-of-the-art performance when compared to existing methods that employ the fine-tuning approach.
引用
收藏
页码:664 / 673
页数:10
相关论文
共 50 条
  • [31] Uncovering and Categorizing Social Biases in Text-to-SQL
    Liu, Yan
    Gao, Yan
    Su, Zhe
    Chen, Xiaokang
    Ash, Elliott
    Lou, Jian-Guang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13573 - 13584
  • [32] Improving Text-to-SQL with a Hybrid Decoding Method
    Jeong, Geunyeong
    Han, Mirae
    Kim, Seulgi
    Lee, Yejin
    Lee, Joosang
    Park, Seongsik
    Kim, Harksoo
    ENTROPY, 2023, 25 (03)
  • [33] Integrating Question Answering and Text-to-SQL in Portuguese
    Jose, Marcos Menon
    Jose, Marcelo Archanjo
    Maua, Denis Deratani
    Cozman, Fabio Gagliardi
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 278 - 287
  • [34] Error Detection for Text-to-SQL Semantic Parsing
    Chen, Shijie
    Chen, Ziru
    Sun, Huan
    Su, Yu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11730 - 11743
  • [35] An Exploratory Study on Model Compression for Text-to-SQL
    Sun, Shuo
    Gao, Yuze
    Zhang, Yuchen
    Su, Jian
    Bin Chen
    Lin, Yingzhan
    Sun, Shuqi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11647 - 11654
  • [36] Structure-Grounded Pretraining for Text-to-SQL
    Deng, Xiang
    Awadallah, Ahmed Hassan
    Meek, Christopher
    Polozov, Oleksandr
    Sun, Huan
    Richardson, Matthew
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1337 - 1350
  • [37] An In-Depth Benchmarking of Text-to-SQL Systems
    Gkini, Orest
    Belmpas, Theofilos
    Koutrika, Georgia
    Ioannidis, Yannis
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 632 - 644
  • [38] A survey on deep learning approaches for text-to-SQL
    George Katsogiannis-Meimarakis
    Georgia Koutrika
    The VLDB Journal, 2023, 32 : 905 - 936
  • [39] Towards Text-to-SQL over Aggregate Tables
    Shuqin Li
    Kaibin Zhou
    Zeyang Zhuang
    Haofen Wang
    Jun Ma
    Data Intelligence, 2023, 5 (02) : 457 - 474
  • [40] Ar-Spider: Text-to-SQL in Arabic
    Almohaimeed, Saleh
    Almohaimeed, Saad
    Al Ghanim, Mansour
    Wang, Liqiang
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1024 - 1030