Structure-Grounded Pretraining for Text-to-SQL

被引:0
|
作者
Deng, Xiang [1 ,2 ]
Awadallah, Ahmed Hassan [2 ]
Meek, Christopher [2 ]
Polozov, Oleksandr [2 ]
Sun, Huan [1 ]
Richardson, Matthew [2 ]
机构
[1] Ohio State Univ, Columbus, OH 43210 USA
[2] Microsoft Res, Redmond, WA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning to capture text-table alignment is essential for tasks like text-to-SQL. A model needs to correctly recognize natural language references to columns and values and to ground them in the given database schema. In this paper, we present a novel weakly supervised Structure-Grounded pretraining framework (STRUG) for text-to-SQL that can effectively learn to capture text-table alignment based on a parallel text-table corpus. We identify a set of novel pretraining tasks: column grounding, value grounding and column-value mapping, and leverage them to pretrain a text-table encoder. Additionally, to evaluate different methods under more realistic text-table alignment settings, we create a new evaluation set Spider-Realistic based on Spider dev set with explicit mentions of column names removed, and adopt eight existing textto-SQL datasets for cross-database evaluation. S TRuG brings significant improvement over BERTLARGE in all settings. Compared with existing pretraining methods such as GRAPPA, S TRuG achieves similar performance on Spider, and outperforms all baselines on more realistic sets. All the code and data used in this work is public available at https://aka.ms/strug.
引用
收藏
页码:1337 / 1350
页数:14
相关论文
共 50 条
  • [1] Multitask Pretraining with Structured Knowledge for Text-to-SQL Generation
    Giaquinto, Robert
    Zhang, Dejiao
    Kleiner, Benjamin
    Li, Yang
    Tan, Ming
    Bhatia, Parminder
    Nallapati, Ramesh
    Ma, Xiaofei
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 11067 - 11083
  • [2] On the Vulnerabilities of Text-to-SQL Models
    Peng, Xutan
    Zhang, Yipeng
    Yang, Jingfeng
    Stevenson, Mark
    2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 1 - 12
  • [3] Semantic Decomposition of Question and SQL for Text-to-SQL Parsing
    Eyal, Ben
    Bachar, Amir
    Haroche, Ophir
    Mahabi, Moran
    Elhadad, Michael
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 13629 - 13645
  • [4] Decoupling SQL query hardness parsing for text-to-SQL
    Yi, Jiawen
    Chen, Guo
    Zhou, Xiaojun
    Neurocomputing, 621
  • [5] Decoupling SQL query hardness parsing for text-to-SQL
    Yi, Jiawen
    Chen, Guo
    Zhou, Xiaojun
    NEUROCOMPUTING, 2025, 621
  • [6] Improving Text-to-SQL Evaluation Methodology
    Finegan-Dollak, Catherine
    Kummerfeld, Jonathan K.
    Zhang, Li
    Ramanathan, Karthik
    Sadasivam, Sesh
    Zhang, Rui
    Radev, Dragomir
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 351 - 360
  • [7] Exploring Schema Generalizability of Text-to-SQL
    Li, Jieyu
    Chen, Lu
    Cao, Ruisheng
    Zhu, Su
    Xu, Hongshen
    Chen, Zhi
    Zhang, Hanchong
    Yu, Kai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1344 - 1360
  • [8] Representing Schema Structure with Graph Neural Networks for Text-to-SQL Parsing
    Bogin, Ben
    Gardner, Matt
    Berant, Jonathan
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4560 - 4565
  • [9] SQL-to-Schema Enhances Schema Linking in Text-to-SQL
    Yang, Sun
    Su, Qiong
    Li, Zhishuai
    Li, Ziyue
    Mao, Hangyu
    Liu, Chenxi
    Zhao, Rui
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, DEXA 2024, 2024, 14910 : 139 - 145
  • [10] Text-to-SQL: A methodical review of challenges and models
    Kanburoglu, Ali Bugra
    Tek, F. Boray
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2024, 32 (03) : 403 - 419