AttractionDetailsQA: An Attraction Details Focused on Chinese Question Answering Dataset

被引:1
|
作者
Huang, Weiming [1 ,2 ]
Xu, Shiting [3 ]
Wang Yuhan [4 ]
Jin Fan [1 ,2 ]
Chang, Qingling [1 ,2 ]
机构
[1] Wuyi Univ, Fac Intelligent Mfg, Jiangmen 529000, Peoples R China
[2] China Germany Artificial Intelligence Inst Jiangm, Jiangmen 529000, Peoples R China
[3] Zhuhai 4DAGE Technol Co Ltd, Zhuhai 519000, Peoples R China
[4] Jiangsu Univ Sci & Technol, Sch Naval Architecture & Ocean Engn, Zhenjiang 212003, Jiangsu, Peoples R China
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Annotations; Data models; Question answering (information retrieval); Manuals; Layout; Benchmark testing; Tourism industry; Attraction detail dataset; question-answering pair generation;
D O I
10.1109/ACCESS.2022.3181188
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the increase in the number of domestic tourists and the popularity of digital upgrades in attractions, it is crucial to develop a question-answering(QA) system about the details of the attractions. However, there is little work on attractions QA, and the main bottleneck is the lack of available datasets. While previous QA datasets usually focus on news domain like CNN/DAILYMAIL and NewsQA, we present the first large-scale dataset for QA over attraction details. To ensure that the data we collected are useful, we only gather the data from public travel information website. Unlike other QA datasets like SQuAD, which is labeled manually, we formed the dataset by manual and question-answer pair generation(QAG) annotated model. Finally, we obtained a dataset covering 2,808 attractions with a total of 18,245 QA pairs, including seven types of attraction details: location, time, component, area, layout, rating, and character. The dataset is available at https://github.com/wyman130/AttractionDetailsQA. Considering that QAG has not been much studied in attraction details, we experimented some QAG models on this dataset and obtained the benchmark. This provides a basis for subsequent improvements to the dataset and research on QAG in attraction details.
引用
收藏
页码:86215 / 86221
页数:7
相关论文
共 50 条
  • [1] TwEETQA: A Social Media Focused Question Answering Dataset
    Xiong, Wenhan
    Wu, Jiawei
    Wang, Hong
    Kulkarni, Vivek
    Yu, Mo
    Chang, Shiyu
    Guo, Xiaoxiao
    Wang, William Yang
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 5020 - 5031
  • [2] Automatic question answering for multiple stakeholders, the epidemic question answering dataset
    Travis R. Goodwin
    Dina Demner-Fushman
    Kyle Lo
    Lucy Lu Wang
    Hoa T. Dang
    Ian M. Soboroff
    Scientific Data, 9
  • [3] Automatic question answering for multiple stakeholders, the epidemic question answering dataset
    Goodwin, Travis R.
    Demner-Fushman, Dina
    Lo, Kyle
    Wang, Lucy Lu
    Dang, Hoa T.
    Soboroff, Ian M.
    SCIENTIFIC DATA, 2022, 9 (01)
  • [4] QookA: A Cooking Question Answering Dataset
    Frummet, Alexander
    Elsweiler, David
    PROCEEDINGS OF THE 2024 CONFERENCE ON HUMAN INFORMATION INTERACTION AND RETRIEVAL, CHIIR 2024, 2024, : 406 - 410
  • [5] PQuAD: A Persian question answering dataset
    Darvishi, Kasra
    Shahbodaghkhan, Newsha
    Abbasiantaeb, Zahra
    Momtazi, Saeedeh
    COMPUTER SPEECH AND LANGUAGE, 2023, 80
  • [6] FQuAD: French Question Answering Dataset
    d'Hoffschmidt, Martin
    Belblidia, Wacim
    Heinrich, Quentin
    Brendle, Tom
    Vidal, Maxime
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1193 - 1208
  • [7] Applying deep matching networks to Chinese medical question answering: a study and a dataset
    Junqing He
    Mingming Fu
    Manshu Tu
    BMC Medical Informatics and Decision Making, 19
  • [8] Applying deep matching networks to Chinese medical question answering: a study and a dataset
    He, Junqing
    Fu, Mingming
    Tu, Manshu
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (Suppl 2)
  • [9] Slovak Dataset for Multilingual Question Answering
    Hladek, Daniel
    Stas, Jan
    Juhar, Jozef
    Koctur, Tomas
    IEEE ACCESS, 2023, 11 : 32869 - 32881
  • [10] VQuAnDa: Verbalization QUestion ANswering DAtaset
    Kacupaj, Endri
    Zafar, Hamid
    Lehmann, Jens
    Maleshkova, Maria
    SEMANTIC WEB (ESWC 2020), 2020, 12123 : 531 - 547