MultiSpider: Towards Benchmarking Multilingual Text-to-SQL Semantic Parsing

被引:0
|
作者
Dou, Longxu [1 ]
Gao, Yan [2 ]
Pan, Mingyang [1 ]
Wang, Dingzirui [1 ]
Che, Wanxiang [1 ]
Zhan, Dechen [1 ]
Lou, Jian-Guang [2 ]
机构
[1] Harbin Inst Technol, Harbin, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text- to-SQL semantic parsing is an important NLP task, which greatly facilitates the interaction between users and the database and becomes the key component in many human-computer interaction systems. Much recent progress in text-to-SQL has been driven by large-scale datasets, but most of them are centered on English. In this work, we present MULTISPIDER, the largest multilingual text-to-SQL dataset which covers seven languages (English, German, French, Spanish, Japanese, Chinese, and Vietnamese). Upon MULTISPIDER, we further identify the lexical and structural challenges of text-to-SQL (caused by specific language properties and dialect sayings) and their intensity across different languages. Experimental results under three typical settings (zero-shot, monolingual and multilingual) reveal a 6.1% absolute drop in accuracy in non-English languages. Qualitative and quantitative analyses are conducted to understand the reason for the performance drop of each language. Besides the dataset, we also propose a simple schema augmentation framework SAVE (Schema-Augmentation-with-Verification), which significantly boosts the overall performance by about 1.8% and closes the 29.5% performance gap across languages.
引用
收藏
页码:12745 / 12753
页数:9
相关论文
共 50 条
  • [21] Benchmarking and Improving Text-to-SQL Generation under Ambiguity
    Bhaskar, Adithya
    Tomar, Tushar
    Sathe, Ashutosh
    Sarawagi, Sunita
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 7053 - 7074
  • [22] On Modern Text-to-SQL Semantic Parsing Methodologies for Natural Language Interface to Databases: A Comparative Study
    Visperas, Moses
    Adoptante, Aunhel John
    Borjal, Christalline Joie
    Abia, Ma. Teresita
    Catapang, Jasper Kyle
    Peramo, Elmer
    2023 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION, ICAIIC, 2023, : 390 - 396
  • [23] UniSAr: a unified structure-aware autoregressive language model for text-to-SQL semantic parsing
    Longxu Dou
    Yan Gao
    Mingyang Pan
    Dingzirui Wang
    Wanxiang Che
    Jian-Guang Lou
    Dechen Zhan
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 4361 - 4376
  • [24] Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion
    Zhao, Chen
    Su, Yu
    Pauls, Adam
    Platanios, Emmanouil Antonios
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 5568 - 5578
  • [25] RESDSQL: Decoupling Schema Linking and Skeleton Parsing for Text-to-SQL
    Li, Haoyang
    Zhang, Jing
    Li, Cuiping
    Chen, Hong
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 11, 2023, : 13067 - 13075
  • [26] Global Reasoning over Database Structures for Text-to-SQL Parsing
    Bogin, Ben
    Gardner, Matt
    Berant, Jonathan
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 3659 - 3664
  • [27] Leveraging Large Language Model for Enhanced Text-to-SQL Parsing
    Zhan, Zecheng
    Haihong, E.
    Song, Meina
    IEEE ACCESS, 2025, 13 : 30497 - 30504
  • [28] Exploring the Compositional Generalization in Context Dependent Text-to-SQL Parsing
    Liu, Aiwei
    Liu, Wei
    Hu, Xuming
    Li, Shu'ang
    Ma, Fukun
    Yang, Yawen
    Wen, Lijie
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 688 - 700
  • [29] Hierarchical Schema Representation for Text-to-SQL Parsing With Decomposing Decoding
    Song, Meina
    Zhan, Zecheng
    E, Haihong
    IEEE ACCESS, 2019, 7 : 103706 - 103715
  • [30] DuoRAT: Towards Simpler Text-to-SQL Models
    Scholale, Torsten
    Li, Raymond
    Bandanau, Dzmitry
    de Vries, Harm
    Pal, Chris
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), 2021, : 1313 - 1321