SleepQA: A Health Coaching Dataset on Sleep for Extractive Question Answering

被引:0
|
作者
Bojic, Iva [1 ]
Ong, Qi Chwen [1 ]
Thakkar, Megh [1 ]
Kamran, Esha [2 ]
Le Shua, Irving Yu [1 ]
Pang, Jaime Rei Ern [1 ]
Chen, Jessica [2 ]
Nayak, Vaaruni [2 ]
Joty, Shafiq [1 ,3 ]
Car, Josip [1 ,2 ]
机构
[1] Nanyang Technol Univ, Singapore, Singapore
[2] Imperial Coll London, London, England
[3] Salesforce Res, Washington, DC USA
来源
关键词
Factual Question Answering; Dense Passage Retrieval; Evidence-based Knowledge; Domain-specific Natural Language Processing;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Question Answering (QA) systems can support health coaches in facilitating clients' lifestyle behavior changes (e.g., in adopting healthy sleep habits). In this paper, we design a domain-specific QA pipeline for sleep coaching. To this end, we release SleepQA, a dataset created from 7,005 passages comprising 4,250 training examples with single annotations and 750 examples with 5-way annotations(1). We fine-tuned different domain-specific BERT models on our dataset and perform extensive automatic and human evaluation of the resulting end-to-end QA pipeline. Comparisons of our pipeline with baseline show improvements in domain-specific natural language processing on real-world questions. We hope that this dataset will lead to wider research interest in this important health domain.
引用
收藏
页码:199 / 217
页数:19
相关论文
共 50 条
  • [31] A Portuguese Dataset for Evaluation of Semantic Question Answering
    de Araujo, Denis Andrei
    Rigo, Sandro Jose
    Quaresma, Paulo
    Muniz, Joao Henrique
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2020, 2020, 12037 : 217 - 227
  • [32] Single-dataset Experts for Multi-dataset Question Answering
    Friedman, Dan
    Dodge, Ben
    Chen, Danqi
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 6128 - 6137
  • [33] AutoEQA: Auto-Encoding Questions for Extractive Question Answering
    Varanasi, Stalin
    Amin, Saadullah
    Neumann, Guenter
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4706 - 4712
  • [34] Extractive-Boolean Question Answering For Scientific Fact Checking
    Rakotoson, Loic
    Letaillieur, Charles
    Massip, Sylvain
    Laleye, Frejus A. A.
    1ST ACM INTERNATIONAL WORKSHOP ON MULTIMEDIA AI AGAINST DISINFORMATION, MAD 2022, 2022, : 27 - 34
  • [35] Dataset bias: A case study for visual question answering
    Das A.
    Anjum S.
    Gurari D.
    Proceedings of the Association for Information Science and Technology, 2019, 56 (01): : 58 - 67
  • [36] Improvisation of Dataset Efficiency in Visual Question Answering Domain
    Mohamed, Sheerin Sitara Noor
    Srinivasan, Kavitha
    STATISTICS AND APPLICATIONS, 2022, 20 (02): : 279 - 289
  • [37] RuBQ 2.0: An Innovated Russian Question Answering Dataset
    Rybin, Ivan
    Korablinov, Vladislav
    Efimov, Pavel
    Braslavski, Pavel
    SEMANTIC WEB, ESWC 2021, 2021, 12731 : 532 - 547
  • [38] Building a benchmark dataset for the Kurdish news question answering
    Saeed, Ari M.
    DATA IN BRIEF, 2024, 57
  • [39] EgoVQA - An Egocentric Video Question Answering Benchmark Dataset
    Fan, Chenyou
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4359 - 4366
  • [40] A dataset for medical instructional video classification and question answering
    Gupta, Deepak
    Attal, Kush
    Demner-Fushman, Dina
    SCIENTIFIC DATA, 2023, 10 (01)