Structure-Grounded Pretraining for Text-to-SQL

被引:0
|
作者
Deng, Xiang [1 ,2 ]
Awadallah, Ahmed Hassan [2 ]
Meek, Christopher [2 ]
Polozov, Oleksandr [2 ]
Sun, Huan [1 ]
Richardson, Matthew [2 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Microsoft Res, Redmond, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning to capture text-table alignment is essential for tasks like text-to-SQL. A model needs to correctly recognize natural language references to columns and values and to ground them in the given database schema. In this paper, we present a novel weakly supervised Structure-Grounded pretraining framework (STRUG) for text-to-SQL that can effectively learn to capture text-table alignment based on a parallel text-table corpus. We identify a set of novel pretraining tasks: column grounding, value grounding and column-value mapping, and leverage them to pretrain a text-table encoder. Additionally, to evaluate different methods under more realistic text-table alignment settings, we create a new evaluation set Spider-Realistic based on Spider dev set with explicit mentions of column names removed, and adopt eight existing textto-SQL datasets for cross-database evaluation. S TRuG brings significant improvement over BERTLARGE in all settings. Compared with existing pretraining methods such as GRAPPA, S TRuG achieves similar performance on Spider, and outperforms all baselines on more realistic sets. All the code and data used in this work is public available at https://aka.ms/strug.
引用
收藏
页码:1337 / 1350
页数:14
相关论文
共 50 条
  • [31] A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese
    Anh Tuan Nguyen
    Mai Hoang Dao
    Dat Quoc Nguyen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4079 - 4085
  • [32] Generate Text-to-SQL Queries Based on Sketch Filling
    Fu, Yinpei
    Ye, Songtao
    Fan, Hongjie
    IEEE ACCESS, 2024, 12 : 152392 - 152403
  • [33] Enhancing Text-to-SQL Translation for Financial System Design
    Song, Yewei
    Ezzini, Saad
    Tang, Xunzhu
    Lothritz, Cedric
    Klein, Jacques
    Bissyande, Tegawende
    Boytsov, Andrey
    Ble, Ulrick
    Goujon, Anne
    2024 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE, ICSE-SEIP 2024, 2024, : 252 - 262
  • [34] Data-Anonymous Encoding for Text-to-SQL Generation
    Dong, Zhen
    Sun, Shizhao
    Liu, Hongzhi
    Lou, Jian-Guang
    Zhang, Dongmei
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5405 - 5414
  • [35] Benchmarking and Improving Text-to-SQL Generation under Ambiguity
    Bhaskar, Adithya
    Tomar, Tushar
    Sathe, Ashutosh
    Sarawagi, Sunita
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 7053 - 7074
  • [36] Re-appraising the Schema Linking for Text-to-SQL
    Gan, Yujian
    Chen, Xinyun
    Purver, Matthew
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 835 - 852
  • [37] Graph Reasoning Enhanced Language Models for Text-to-SQL
    Gong, Zheng
    Sun, Ying
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2447 - 2451
  • [38] Exploring Chain of Thought Style Prompting for Text-to-SQL
    Tai, Chang-You
    Chen, Ziru
    Zhang, Tianshu
    Deng, Xiang
    Sun, Huan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5376 - 5393
  • [39] A Review of Cross-Domain Text-to-SQL Models
    Gan, Yujian
    Purver, Matthew
    Woodward, John R.
    AACL-IJCNLP 2020: THE 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2020, : 101 - 108
  • [40] Thai Question Text-To-SQL Parsing Using Transformer
    Tungruethaipak, Natthawat
    Prom-on, Santitham
    2024 21ST INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING, JCSSE 2024, 2024, : 631 - 637