A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL

被引:2
|
作者
Cao, Ruisheng [1 ]
Chen, Lu [1 ]
Li, Jieyu [1 ]
Zhang, Hanchong [1 ]
Xu, Hongshen [1 ]
Zhang, Wangyou [1 ]
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, X LANCE Lab, MoE Key Lab Artificial Intelligence, Dept Comp Sci & Engn,AI Inst, Shanghai 200240, Peoples R China
关键词
Structured Query Language; Decoding; Databases; Syntactics; Semantics; Task analysis; Computational modeling; Abstract syntax tree; grammar-based constrained decoding; heterogeneous graph neural network; knowledge-driven natural language processing; permutation invariant problem; text; -to-SQL;
D O I
10.1109/TPAMI.2023.3298895
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-SQL is the task of converting a natural language utterance plus the corresponding database schema into a SQL program. The inputs naturally form a heterogeneous graph while the output SQL can be transduced into an abstract syntax tree (AST). Traditional encoder-decoder models ignore higher-order semantics in heterogeneous graph encoding and introduce permutation biases during AST construction, thus incapable of exploiting the refined structure knowledge precisely. In this work, we propose a generic heterogeneous graph to abstract syntax tree (HG2AST) framework to integrate dedicated structure knowledge into statistics-based models. On the encoder side, we leverage a line graph enhanced encoder (LGESQL) to iteratively update both node and edge features through dual graph message passing and aggregation. On the decoder side, a grammar-based decoder first constructs the equivalent SQL AST and then transforms it into the desired SQL via post-processing. To avoid over-fitting permutation biases, we propose a golden tree-oriented learning (GTL) algorithm to adaptively control the expanding order of AST nodes. The graph encoder and tree decoder are combined into a unified framework through two auxiliary modules. Extensive experiments on various text-to-SQL datasets, including single/multi-table, single/cross-domain, and multilingual settings, demonstrate the superiority and broad applicability.
引用
收藏
页码:13796 / 13813
页数:18
相关论文
共 50 条
  • [31] Improving Text-to-SQL with a Hybrid Decoding Method
    Jeong, Geunyeong
    Han, Mirae
    Kim, Seulgi
    Lee, Yejin
    Lee, Joosang
    Park, Seongsik
    Kim, Harksoo
    ENTROPY, 2023, 25 (03)
  • [32] Integrating Question Answering and Text-to-SQL in Portuguese
    Jose, Marcos Menon
    Jose, Marcelo Archanjo
    Maua, Denis Deratani
    Cozman, Fabio Gagliardi
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 278 - 287
  • [33] Error Detection for Text-to-SQL Semantic Parsing
    Chen, Shijie
    Chen, Ziru
    Sun, Huan
    Su, Yu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11730 - 11743
  • [34] An Exploratory Study on Model Compression for Text-to-SQL
    Sun, Shuo
    Gao, Yuze
    Zhang, Yuchen
    Su, Jian
    Bin Chen
    Lin, Yingzhan
    Sun, Shuqi
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 11647 - 11654
  • [35] Structure-Grounded Pretraining for Text-to-SQL
    Deng, Xiang
    Awadallah, Ahmed Hassan
    Meek, Christopher
    Polozov, Oleksandr
    Sun, Huan
    Richardson, Matthew
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1337 - 1350
  • [36] An In-Depth Benchmarking of Text-to-SQL Systems
    Gkini, Orest
    Belmpas, Theofilos
    Koutrika, Georgia
    Ioannidis, Yannis
    SIGMOD '21: PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2021, : 632 - 644
  • [37] A survey on deep learning approaches for text-to-SQL
    George Katsogiannis-Meimarakis
    Georgia Koutrika
    The VLDB Journal, 2023, 32 : 905 - 936
  • [38] Towards Text-to-SQL over Aggregate Tables
    Shuqin Li
    Kaibin Zhou
    Zeyang Zhuang
    Haofen Wang
    Jun Ma
    Data Intelligence, 2023, 5 (02) : 457 - 474
  • [39] Ar-Spider: Text-to-SQL in Arabic
    Almohaimeed, Saleh
    Almohaimeed, Saad
    Al Ghanim, Mansour
    Wang, Liqiang
    39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024, 2024, : 1024 - 1030
  • [40] RGISQL: Integrating Refined Grammatical Information into Relational Graph Neural Network for Text-to-SQL Task
    Li, Shuiyan
    He, Yaozhen
    Ao, Longhao
    Qi, Rongzhi
    APPLIED SCIENCES-BASEL, 2024, 14 (22):