DISFL-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

被引:0
|
作者
Gupta, Aditya [1 ]
Xu, Jiacheng [2 ,4 ]
Upadhyay, Shyam [1 ]
Yang, Diyi [3 ]
Faruqui, Manaal [1 ]
机构
[1] Google Assistant, Mountain View, CA USA
[2] Univ Texas Austin, Austin, TX 78712 USA
[3] Georgia Inst Technol, Atlanta, GA 30332 USA
[4] Google, Mountain View, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Disfluencies is an under-studied topic in NLP, even though it is ubiquitous in human conversation. This is largely due to the lack of datasets containing disfluencies. In this paper, we present a new challenge question answering dataset, DISFL-QA, a derivative of SQUAD, where humans introduce contextual disfluencies in previously fluent questions. DISFL- QA contains a variety of challenging disfluencies that require a more comprehensive understanding of the text than what was necessary in prior datasets. Experiments show that the performance of existing state-of-the-art question answering models degrades significantly when tested on DISFLQA in a zero-shot setting. We show data augmentation methods partially recover the loss in performance and also demonstrate the efficacy of using gold data for fine-tuning. We argue that we need large-scale disfluency datasets in order for NLP models to be robust to them. The dataset is publicly available at: https://github.com/ google-research-datasets/disfl-qa.
引用
收藏
页码:3309 / 3319
页数:11
相关论文
共 50 条
  • [1] Env-QA: A Video Question Answering Benchmark for Comprehensive Understanding of Dynamic Environments
    Gao, Difei
    Wang, Ruiping
    Bai, Ziyi
    Chen, Xilin
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1655 - 1665
  • [2] ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
    Yu, Zhou
    Xu, Dejing
    Yu, Jun
    Yu, Ting
    Zhao, Zhou
    Zhuang, Yueting
    Tao, Dacheng
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 9127 - 9134
  • [3] Question and Answer Classification in Czech Question Answering Benchmark Dataset
    Kusnirakova, Dasa
    Medved, Marek
    Horak, Ales
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 701 - 706
  • [4] Building a benchmark dataset for the Kurdish news question answering
    Saeed, Ari M.
    DATA IN BRIEF, 2024, 57
  • [5] EgoVQA - An Egocentric Video Question Answering Benchmark Dataset
    Fan, Chenyou
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4359 - 4366
  • [6] ViOCRVQA: novel benchmark dataset and VisionReader for visual question answering by understanding Vietnamese text in images
    Pham, Huy Quang
    Nguyen, Thang Kien-Bao
    Nguyen, Quan Van
    Tran, Dan Quang
    Nguyen, Nghia Hieu
    Nguyen, Kiet Van
    Nguyen, Ngan Luu-Thuy
    MULTIMEDIA SYSTEMS, 2025, 31 (02)
  • [7] Question Answering over Electronic Devices: A New Benchmark Dataset and a Multi-Task Learning based QA Framework
    Nandy, Abhilash
    Sharma, Soumya
    Maddhashiya, Shubham
    Sachdeva, Kapil
    Goyal, Pawan
    Ganguly, Niloy
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4600 - 4609
  • [8] JEC-QA: A Legal-Domain Question Answering Dataset
    Zhong, Haoxi
    Xiao, Chaojun
    Tu, Cunchao
    Zhang, Tianyang
    Liu, Zhiyuan
    Sun, Maosong
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9701 - 9708
  • [9] ECG-QA: A Comprehensive Question Answering Dataset Combined With Electrocardiogram
    Oh, Jungwoo
    Lee, Gyubok
    Bae, Seongsu
    Kwon, Joon-Myoung
    Choi, Edward
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] Transformer Models for Question Answering on Autism Spectrum Disorder QA Dataset
    Firsanova, Victoria
    DIGITAL TRANSFORMATION AND GLOBAL SOCIETY, DTGS 2021, 2022, 1503 : 122 - 133