AttractionDetailsQA: An Attraction Details Focused on Chinese Question Answering Dataset

被引:1
|
作者
Huang, Weiming [1 ,2 ]
Xu, Shiting [3 ]
Wang Yuhan [4 ]
Jin Fan [1 ,2 ]
Chang, Qingling [1 ,2 ]
机构
[1] Wuyi Univ, Fac Intelligent Mfg, Jiangmen 529000, Peoples R China
[2] China Germany Artificial Intelligence Inst Jiangm, Jiangmen 529000, Peoples R China
[3] Zhuhai 4DAGE Technol Co Ltd, Zhuhai 519000, Peoples R China
[4] Jiangsu Univ Sci & Technol, Sch Naval Architecture & Ocean Engn, Zhenjiang 212003, Jiangsu, Peoples R China
来源
IEEE ACCESS | 2022年 / 10卷
关键词
Annotations; Data models; Question answering (information retrieval); Manuals; Layout; Benchmark testing; Tourism industry; Attraction detail dataset; question-answering pair generation;
D O I
10.1109/ACCESS.2022.3181188
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the increase in the number of domestic tourists and the popularity of digital upgrades in attractions, it is crucial to develop a question-answering(QA) system about the details of the attractions. However, there is little work on attractions QA, and the main bottleneck is the lack of available datasets. While previous QA datasets usually focus on news domain like CNN/DAILYMAIL and NewsQA, we present the first large-scale dataset for QA over attraction details. To ensure that the data we collected are useful, we only gather the data from public travel information website. Unlike other QA datasets like SQuAD, which is labeled manually, we formed the dataset by manual and question-answer pair generation(QAG) annotated model. Finally, we obtained a dataset covering 2,808 attractions with a total of 18,245 QA pairs, including seven types of attraction details: location, time, component, area, layout, rating, and character. The dataset is available at https://github.com/wyman130/AttractionDetailsQA. Considering that QAG has not been much studied in attraction details, we experimented some QAG models on this dataset and obtained the benchmark. This provides a basis for subsequent improvements to the dataset and research on QAG in attraction details.
引用
收藏
页码:86215 / 86221
页数:7
相关论文
共 50 条
  • [41] PersianQuAD: The Native Question Answering Dataset for the Persian Language
    Kazemi, Arefeh
    Mozafari, Jamshid
    Nematbakhsh, Mohammad Ali
    IEEE Access, 2022, 10 : 26045 - 26057
  • [42] PersianQuAD: The Native Question Answering Dataset for the Persian Language
    Kazemi, Arefeh
    Mozafari, Jamshid
    Nematbakhsh, Mohammad Ali
    IEEE ACCESS, 2022, 10 : 26045 - 26057
  • [43] TheoremQA: A Theorem-driven Question Answering Dataset
    Chen, Wenhu
    Yin, Ming
    Ku, Max
    Lu, Pan
    Wan, Yixin
    Ma, Xueguang
    Xu, Jianyu
    Wang, Xinyi
    Xia, Tony
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 7889 - 7901
  • [44] DAWQAS: A Dataset for Arabic Why Question Answering System
    Ismail, Walaa Saber
    Homsi, Masun Nabhan
    ARABIC COMPUTATIONAL LINGUISTICS, 2018, 142 : 123 - 131
  • [45] QASC: A Dataset for Question Answering via Sentence Composition
    Khot, Tushar
    Clark, Peter
    Guerquin, Michal
    Jansen, Peter
    Sabharwal, Ashish
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 8082 - 8090
  • [46] A dataset for medical instructional video classification and question answering
    Deepak Gupta
    Kush Attal
    Dina Demner-Fushman
    Scientific Data, 10
  • [47] MultiSpanQA: A Dataset for Multi-Span Question Answering
    Li, Haonan
    Vasardani, Maria
    Tomko, Martin
    Baldwin, Timothy
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 1250 - 1260
  • [48] ToolQA: A Dataset for LLM Question Answering with External Tools
    Zhuang, Yuchen
    Yu, Yue
    Wang, Kuan
    Sun, Haotian
    Zhang, Chao
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [49] Chinese question-answering system
    Huang, GT
    Yao, HH
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2004, 19 (04) : 479 - 488
  • [50] BiRdQA: A Bilingual Dataset for Question Answering on Tricky Riddles
    Zhang, Yunxiang
    Wan, Xiaojun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11748 - 11756