A Serbian Question Answering Dataset Created by Using the Web Scraping Technique

被引:0
|
作者
Cenic, Aleksandar B. [1 ]
Stojkovic, Suzana [1 ]
机构
[1] Univ Nis, Fac Elect Engn, Aleksandra Medvedeva 14, Nish 18000, Serbia
关键词
Question answering system; Web scraping; Question answering dataset; EXTRACTION;
D O I
10.1109/ICEST58410.2023.10187370
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Every artificial intelligence task requires a particular dataset to train the model and test it. As the expansion of the field of AI accelerates, data is becoming a critical resource. Natural language processing is a specific field in artificial intelligence that requires separate datasets for each task and each processed language. This paper describes the process of collecting a dataset for a question answering system in the Serbian language. Data collection was achieved using the Web scraping method. The Web scraper was implemented in the Python programming language. The resulting dataset contains 16374 questions and answers in 6 different fields: history, biology, geography, physics, chemistry, and mathematics.
引用
收藏
页码:147 / 150
页数:4
相关论文
共 50 条
  • [1] Web question answering using impression mining technique
    Kumamoto, Tadahiko
    Tanaka, Katsumi
    IMECS 2007: INTERNATIONAL MULTICONFERENCE OF ENGINEERS AND COMPUTER SCIENTISTS, VOLS I AND II, 2007, : 893 - +
  • [2] Question Answering using Web Lists
    Katti, Anoop R.
    Hui, Kai
    de Gispert, Adria
    Fuerstenau, Hagen
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 3132 - 3136
  • [3] Automatic question answering for multiple stakeholders, the epidemic question answering dataset
    Travis R. Goodwin
    Dina Demner-Fushman
    Kyle Lo
    Lucy Lu Wang
    Hoa T. Dang
    Ian M. Soboroff
    Scientific Data, 9
  • [4] Automatic question answering for multiple stakeholders, the epidemic question answering dataset
    Goodwin, Travis R.
    Demner-Fushman, Dina
    Lo, Kyle
    Wang, Lucy Lu
    Dang, Hoa T.
    Soboroff, Ian M.
    SCIENTIFIC DATA, 2022, 9 (01)
  • [5] Question Answering System Using Web Snippets
    Menaha, R.
    Surya, Udhaya A.
    Nandhni, K.
    Ishwarya, M.
    2017 INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC), 2017, : 387 - 390
  • [6] QookA: A Cooking Question Answering Dataset
    Frummet, Alexander
    Elsweiler, David
    PROCEEDINGS OF THE 2024 CONFERENCE ON HUMAN INFORMATION INTERACTION AND RETRIEVAL, CHIIR 2024, 2024, : 406 - 410
  • [7] PQuAD: A Persian question answering dataset
    Darvishi, Kasra
    Shahbodaghkhan, Newsha
    Abbasiantaeb, Zahra
    Momtazi, Saeedeh
    COMPUTER SPEECH AND LANGUAGE, 2023, 80
  • [8] FQuAD: French Question Answering Dataset
    d'Hoffschmidt, Martin
    Belblidia, Wacim
    Heinrich, Quentin
    Brendle, Tom
    Vidal, Maxime
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1193 - 1208
  • [9] Slovak Dataset for Multilingual Question Answering
    Hladek, Daniel
    Stas, Jan
    Juhar, Jozef
    Koctur, Tomas
    IEEE ACCESS, 2023, 11 : 32869 - 32881
  • [10] VQuAnDa: Verbalization QUestion ANswering DAtaset
    Kacupaj, Endri
    Zafar, Hamid
    Lehmann, Jens
    Maleshkova, Maria
    SEMANTIC WEB (ESWC 2020), 2020, 12123 : 531 - 547