XLMRQA: Open-Domain Question Answering on Vietnamese Wikipedia-Based Textual Knowledge Source

被引:1
|
作者
Kiet Van Nguyen [1 ,2 ]
Phong Nguyen-Thuan Do [2 ]
Nhat Duy Nguyen [2 ]
Tin Van Huynh [1 ,2 ]
Anh Gia-Tuan Nguyen [1 ,2 ]
Ngan Luu-Thuy Nguyen [1 ,2 ]
机构
[1] Univ Informat Technol, Fac Informat Sci & Engn, Ho Chi Minh City, Vietnam
[2] Vietnam Natl Univ, Ho Chi Minh City, Vietnam
关键词
Question answering; Transformer; BERT; XLM-R; Transfer learning; Machine reading comprehension;
D O I
10.1007/978-3-031-21743-2_30
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Question answering (QA) is a natural language understanding task within the fields of information retrieval and information extraction that has attracted much attention from the computational linguistics and artificial intelligence research community in recent years because of the strong development of machine reading comprehension-based models. A reader-based QA system is a high-level search engine that can find correct answers to queries or questions in open-domain or domain-specific texts using machine reading comprehension (MRC) techniques. The majority of advancements in data resources and machine-learning approaches in the MRC and QA systems especially are developed significantly in two resource-rich languages such as English and Chinese. A low-resource language like Vietnamese has witnessed a scarcity of research on QA systems. This paper presents XLMRQA, the first Vietnamese QA system using a supervised transformer-based reader on the Wikipedia-based textual knowledge source (using the UIT-ViQuAD corpus), out-performing the two robust QA systems using deep neural network models: DrQA and BERTserini with 24.46% and 6.28%, respectively. From the results obtained on the three systems, we analyze the influence of question types on the performance of the QA systems.
引用
收藏
页码:377 / 389
页数:13
相关论文
共 50 条
  • [31] Document Gated Reader for Open-Domain Question Answering
    Wang, Bingning
    Yao, Ting
    Zhang, Qi
    Xu, Jingfang
    Tian, Zhixing
    Liu, Kang
    Zhao, Jun
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 85 - 94
  • [32] Denoising Distantly Supervised Open-Domain Question Answering
    Lin, Yankai
    Ji, Haozhe
    Liu, Zhiyuan
    Sun, Maosong
    PROCEEDINGS OF THE 56TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL), VOL 1, 2018, : 1736 - 1745
  • [33] The structure and performance of an open-domain question answering system
    Moldovan, D
    Harabagiu, S
    Pasca, M
    Mihalcea, R
    Girju, R
    Goodrum, R
    Rus, V
    38TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE, 2000, : 563 - 570
  • [34] AVADHAN: System for Open-Domain Telugu Question Answering
    Ravva, Priyanka
    Urlana, Ashok
    Shrivastava, Manish
    PROCEEDINGS OF THE 7TH ACM IKDD CODS AND 25TH COMAD (CODS-COMAD 2020), 2020, : 234 - 238
  • [35] Detecting Frozen Phrases in Open-Domain Question Answering
    Yadegari, Mostafa
    Kamalloo, Ehsan
    Rafiei, Davood
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 1990 - 1996
  • [36] Complementary Evidence Identification in Open-Domain Question Answering
    Mou, Xiangyang
    Yu, Mo
    Chang, Shiyu
    Feng, Yufei
    Zhang, Li
    Su, Hui
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2720 - 2726
  • [37] A New Approach For Open-Domain Question Answering System
    Alturani, Ibrahim Mahmoud Ibrahim
    Bin Hamzah, Mohd Pouzi
    INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY, 2018, 18 (06): : 100 - 103
  • [38] Dense Passage Retrieval for Open-Domain Question Answering
    Karpukhin, Vladimir
    Oguz, Barlas
    Min, Sewon
    Lewis, Patrick
    Wu, Ledell
    Edunov, Sergey
    Chen, Danqi
    Yih, Wen Tau
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 6769 - 6781
  • [39] A dataset and baselines for sequential open-domain question answering
    Elgohary, Ahmed
    Zhao, Chen
    Boyd-Graber, Jordan
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 1077 - 1083
  • [40] Using clustering approaches to open-domain question answering
    Wu, Youzheng
    Kashioka, Hideki
    Zhao, Jun
    COMPUTATIONAL LINGUISTICS AND INTELLIGENT TEXT PROCESSING, 2007, 4394 : 506 - +