Structure-Grounded Pretraining for Text-to-SQL

被引:0
|
作者
Deng, Xiang [1 ,2 ]
Awadallah, Ahmed Hassan [2 ]
Meek, Christopher [2 ]
Polozov, Oleksandr [2 ]
Sun, Huan [1 ]
Richardson, Matthew [2 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Microsoft Res, Redmond, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning to capture text-table alignment is essential for tasks like text-to-SQL. A model needs to correctly recognize natural language references to columns and values and to ground them in the given database schema. In this paper, we present a novel weakly supervised Structure-Grounded pretraining framework (STRUG) for text-to-SQL that can effectively learn to capture text-table alignment based on a parallel text-table corpus. We identify a set of novel pretraining tasks: column grounding, value grounding and column-value mapping, and leverage them to pretrain a text-table encoder. Additionally, to evaluate different methods under more realistic text-table alignment settings, we create a new evaluation set Spider-Realistic based on Spider dev set with explicit mentions of column names removed, and adopt eight existing textto-SQL datasets for cross-database evaluation. S TRuG brings significant improvement over BERTLARGE in all settings. Compared with existing pretraining methods such as GRAPPA, S TRuG achieves similar performance on Spider, and outperforms all baselines on more realistic sets. All the code and data used in this work is public available at https://aka.ms/strug.
引用
收藏
页码:1337 / 1350
页数:14
相关论文
共 50 条
  • [11] Service-oriented Text-to-SQL Parsing
    Hu, Wangsu
    Tian, Jilei
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 2218 - 2222
  • [12] DuoRAT: Towards Simpler Text-to-SQL Models
    Scholale, Torsten
    Li, Raymond
    Bandanau, Dzmitry
    de Vries, Harm
    Pal, Chris
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1313 - 1321
  • [13] KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers
    Lee, Chia-Hsuan
    Polozov, Oleksandr
    Richardson, Matthew
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2261 - 2273
  • [14] RuleSQLova: Improving Text-to-SQL with Logic Rules
    Han, Shoukang
    Gao, Neng
    Guo, Xiaobo
    Shan, Yiwei
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [15] Towards Text-to-SQL over Aggregate Tables
    Li, Shuqin
    Zhou, Kaibin
    Zhuang, Zeyang
    Wang, Haofen
    Ma, Jun
    DATA INTELLIGENCE, 2023, 5 (02) : 457 - 474
  • [16] A survey on deep learning approaches for text-to-SQL
    Katsogiannis-Meimarakis, George
    Koutrika, Georgia
    VLDB JOURNAL, 2023, 32 (04): : 905 - 936
  • [17] Uncovering and Categorizing Social Biases in Text-to-SQL
    Liu, Yan
    Gao, Yan
    Su, Zhe
    Chen, Xiaokang
    Ash, Elliott
    Lou, Jian-Guang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 13573 - 13584
  • [18] Improving Text-to-SQL with a Hybrid Decoding Method
    Jeong, Geunyeong
    Han, Mirae
    Kim, Seulgi
    Lee, Yejin
    Lee, Joosang
    Park, Seongsik
    Kim, Harksoo
    ENTROPY, 2023, 25 (03)
  • [19] Integrating Question Answering and Text-to-SQL in Portuguese
    Jose, Marcos Menon
    Jose, Marcelo Archanjo
    Maua, Denis Deratani
    Cozman, Fabio Gagliardi
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 278 - 287
  • [20] Error Detection for Text-to-SQL Semantic Parsing
    Chen, Shijie
    Chen, Ziru
    Sun, Huan
    Su, Yu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11730 - 11743