SQuAD-SRC: A Dataset for Multi-Accent Spoken Reading Comprehension

被引:0
|
作者
Tang, Yixuan [1 ]
Tung, Anthony K. H. [1 ]
机构
[1] Natl Univ Singapore, Dept Comp Sci, Singapore, Singapore
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spoken Reading Comprehension (SRC) is a challenging problem in spoken natural language retrieval, which automatically extracts the answer from the text-form contents according to the audio-form question. However, the existing spoken question answering approaches are mainly based on synthetically generated audio-form data, which may be ineffectively applied for multi-accent spoken question answering directly in many real-world applications. In this paper, we construct a large-scale multi-accent human spoken dataset SQuAD-SRC, in order to study the problem of multi-accent spoken reading comprehension. We choose 24 native English speakers from six different countries with various English accents and construct audio-form questions to the correspondent text-form contents by the chosen speakers. The dataset consists of 98,169 spoken question answering pairs and 20,963 passages from the popular machine reading comprehension dataset SQuAD. We present a statistical analysis of our SQuAD-SRC dataset and conduct extensive experiments on it by comparing cascaded SRC approaches and the enhanced end-to-end ones. Moreover, we explore various adaption strategies to improve the SRC performance, especially for multi-accent spoken questions.
引用
收藏
页码:5206 / 5214
页数:9
相关论文
共 4 条
  • [1] NISP: A Multi-lingual Multi-accent Dataset for Speaker Profiling
    Kalluri, Shareef Babu
    Vijayasenan, Deepu
    Ganapathy, Sriram
    Rajan, Ragesh M.
    Krishnan, Prashant
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6953 - 6957
  • [2] A New Multi-choice Reading Comprehension Dataset for Curriculum Learning
    Liang, Yichan
    Li, Jianheng
    Yin, Jian
    ASIAN CONFERENCE ON MACHINE LEARNING, VOL 101, 2019, 101 : 742 - 757
  • [3] MA-MRC: A Multi-answer Machine Reading Comprehension Dataset
    Yue, Zhiang
    Liu, Jingping
    Zhang, Cong
    Wang, Chao
    Jiang, Haiyun
    Zhang, Yue
    Tian, Xianyang
    Cen, Zhedong
    Xiao, Yanghua
    Ruan, Tong
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2144 - 2148
  • [4] ReCoMIF: Reading comprehension based multi-source information fusion network for Chinese spoken language understanding
    Xie, Bo
    Jia, Xiaohui
    Song, Xiawen
    Zhang, Hua
    Chen, Bi
    Jiang, Bo
    Wang, Ye
    Pan, Yun
    INFORMATION FUSION, 2023, 96 : 192 - 201