MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

被引:0
|
作者
Dou, Longxu [1 ]
Gao, Yan [2 ]
Pan, Mingyang [1 ]
Wang, Dingzirui [1 ]
Che, Wanxiang [1 ]
Zhan, Dechen [1 ]
Lou, Jian-Guang [2 ]
机构
[1] Harbin Inst Technol, Harbin, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text- to-SQL semantic parsing is an important NLP task, which greatly facilitates the interaction between users and the database and becomes the key component in many human-computer interaction systems. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MULTISPIDER, the largest multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MULTISPIDER, we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVE (Schema-Augmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.
引用
收藏
页码:12745 / 12753
页数:9
相关论文
共 50 条
  • [41] On the Vulnerabilities of Text-to-SQL Models
    Peng, Xutan
    Zhang, Yipeng
    Yang, Jingfeng
    Stevenson, Mark
    2023 IEEE 34TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, ISSRE, 2023, : 1 - 12
  • [42] Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
    Yu, Tao
    Zhang, Rui
    Yang, Kai
    Yasunaga, Michihiro
    Wang, Dongxu
    Li, Zifan
    Ma, James
    Li, Irene
    Yao, Qingning
    Roman, Shanelle
    Zhang, Zilin
    Radev, Dragomir R.
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3911 - 3921
  • [43] Improving Text-to-SQL Evaluation Methodology
    Finegan-Dollak, Catherine
    Kummerfeld, Jonathan K.
    Zhang, Li
    Ramanathan, Karthik
    Sadasivam, Sesh
    Zhang, Rui
    Radev, Dragomir
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 351 - 360
  • [44] Exploring Schema Generalizability of Text-to-SQL
    Li, Jieyu
    Chen, Lu
    Cao, Ruisheng
    Zhu, Su
    Xu, Hongshen
    Chen, Zhi
    Zhang, Hanchong
    Yu, Kai
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 1344 - 1360
  • [45] SQL-to-Schema Enhances Schema Linking in Text-to-SQL
    Yang, Sun
    Su, Qiong
    Li, Zhishuai
    Li, Ziyue
    Mao, Hangyu
    Liu, Chenxi
    Zhao, Rui
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, DEXA 2024, 2024, 14910 : 139 - 145
  • [46] Towards Complex Text-to-SQL in Cross-Domain Database with Intermediate Representation
    Guo, Jiaqi
    Zhan, Zecheng
    Gao, Yan
    Xiao, Yan
    Lou, Jian-Guang
    Liu, Ting
    Zhang, Dongmei
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 4524 - 4535
  • [47] Text-to-SQL: A methodical review of challenges and models
    Kanburoglu, Ali Bugra
    Tek, F. Boray
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2024, 32 (03) : 403 - 419
  • [48] KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers
    Lee, Chia-Hsuan
    Polozov, Oleksandr
    Richardson, Matthew
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2261 - 2273
  • [49] RuleSQLova: Improving Text-to-SQL with Logic Rules
    Han, Shoukang
    Gao, Neng
    Guo, Xiaobo
    Shan, Yiwei
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [50] A survey on deep learning approaches for text-to-SQL
    Katsogiannis-Meimarakis, George
    Koutrika, Georgia
    VLDB JOURNAL, 2023, 32 (04): : 905 - 936