A Heterogeneous Graph to Abstract Syntax Tree Framework for Text-to-SQL

被引:2
|
作者
Cao, Ruisheng [1 ]
Chen, Lu [1 ]
Li, Jieyu [1 ]
Zhang, Hanchong [1 ]
Xu, Hongshen [1 ]
Zhang, Wangyou [1 ]
Yu, Kai [1 ]
机构
[1] Shanghai Jiao Tong Univ, X LANCE Lab, MoE Key Lab Artificial Intelligence, Dept Comp Sci & Engn,AI Inst, Shanghai 200240, Peoples R China
关键词
Structured Query Language; Decoding; Databases; Syntactics; Semantics; Task analysis; Computational modeling; Abstract syntax tree; grammar-based constrained decoding; heterogeneous graph neural network; knowledge-driven natural language processing; permutation invariant problem; text; -to-SQL;
D O I
10.1109/TPAMI.2023.3298895
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-SQL is the task of converting a natural language utterance plus the corresponding database schema into a SQL program. The inputs naturally form a heterogeneous graph while the output SQL can be transduced into an abstract syntax tree (AST). Traditional encoder-decoder models ignore higher-order semantics in heterogeneous graph encoding and introduce permutation biases during AST construction, thus incapable of exploiting the refined structure knowledge precisely. In this work, we propose a generic heterogeneous graph to abstract syntax tree (HG2AST) framework to integrate dedicated structure knowledge into statistics-based models. On the encoder side, we leverage a line graph enhanced encoder (LGESQL) to iteratively update both node and edge features through dual graph message passing and aggregation. On the decoder side, a grammar-based decoder first constructs the equivalent SQL AST and then transforms it into the desired SQL via post-processing. To avoid over-fitting permutation biases, we propose a golden tree-oriented learning (GTL) algorithm to adaptively control the expanding order of AST nodes. The graph encoder and tree decoder are combined into a unified framework through two auxiliary modules. Extensive experiments on various text-to-SQL datasets, including single/multi-table, single/cross-domain, and multilingual settings, demonstrate the superiority and broad applicability.
引用
收藏
页码:13796 / 13813
页数:18
相关论文
共 50 条
  • [41] Selective Demonstrations for Cross-domain Text-to-SQL
    Chang, Shuaichen
    Fosler-Lussier, Eric
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 14174 - 14189
  • [42] Small, Medium, and Large Language Models for Text-to-SQL
    Oliveira, Aiko
    Nascimento, Eduardo
    Pinheiro, Joao
    Avila, Caio Viktor S.
    Coelho, Gustavo
    Feijo, Lucas
    Izquierdo, Yenier
    Garcia, Grettel
    Paes Leme, Luiz Andre P.
    Lemos, Melissa
    Casanova, Marco A.
    CONCEPTUAL MODELING, ER 2024, 2025, 15238 : 276 - 294
  • [43] Semantic Evaluation for Text-to-SQL with Distilled Test Suites
    Zhong, Ruiqi
    Yu, Tao
    Klein, Dan
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 396 - 411
  • [44] A Pilot Study of Text-to-SQL Semantic Parsing for Vietnamese
    Anh Tuan Nguyen
    Mai Hoang Dao
    Dat Quoc Nguyen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 4079 - 4085
  • [45] Generate Text-to-SQL Queries Based on Sketch Filling
    Fu, Yinpei
    Ye, Songtao
    Fan, Hongjie
    IEEE ACCESS, 2024, 12 : 152392 - 152403
  • [46] Enhancing Text-to-SQL Translation for Financial System Design
    Song, Yewei
    Ezzini, Saad
    Tang, Xunzhu
    Lothritz, Cedric
    Klein, Jacques
    Bissyande, Tegawende
    Boytsov, Andrey
    Ble, Ulrick
    Goujon, Anne
    2024 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING: SOFTWARE ENGINEERING IN PRACTICE, ICSE-SEIP 2024, 2024, : 252 - 262
  • [47] Data-Anonymous Encoding for Text-to-SQL Generation
    Dong, Zhen
    Sun, Shizhao
    Liu, Hongzhi
    Lou, Jian-Guang
    Zhang, Dongmei
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 5405 - 5414
  • [48] Benchmarking and Improving Text-to-SQL Generation under Ambiguity
    Bhaskar, Adithya
    Tomar, Tushar
    Sathe, Ashutosh
    Sarawagi, Sunita
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 7053 - 7074
  • [49] Re-appraising the Schema Linking for Text-to-SQL
    Gan, Yujian
    Chen, Xinyun
    Purver, Matthew
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 835 - 852
  • [50] Exploring Chain of Thought Style Prompting for Text-to-SQL
    Tai, Chang-You
    Chen, Ziru
    Zhang, Tianshu
    Deng, Xiang
    Sun, Huan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5376 - 5393