A Serbian Question Answering Dataset Created by Using the Web Scraping Technique

被引:0
|
作者
Cenic, Aleksandar B. [1 ]
Stojkovic, Suzana [1 ]
机构
[1] Univ Nis, Fac Elect Engn, Aleksandra Medvedeva 14, Nish 18000, Serbia
关键词
Question answering system; Web scraping; Question answering dataset; EXTRACTION;
D O I
10.1109/ICEST58410.2023.10187370
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Every artificial intelligence task requires a particular dataset to train the model and test it. As the expansion of the field of AI accelerates, data is becoming a critical resource. Natural language processing is a specific field in artificial intelligence that requires separate datasets for each task and each processed language. This paper describes the process of collecting a dataset for a question answering system in the Serbian language. Data collection was achieved using the Web scraping method. The Web scraper was implemented in the Python programming language. The resulting dataset contains 16374 questions and answers in 6 different fields: history, biology, geography, physics, chemistry, and mathematics.
引用
收藏
页码:147 / 150
页数:4
相关论文
共 50 条
  • [21] Question and Answer Classification in Czech Question Answering Benchmark Dataset
    Kusnirakova, Dasa
    Medved, Marek
    Horak, Ales
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 701 - 706
  • [22] PubMedQA: A Dataset for Biomedical Research Question Answering
    Jin, Qiao
    Dhingra, Bhuwan
    Liu, Zhengping
    Cohen, William W.
    Lu, Xinghua
    2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, 2019, : 2567 - 2577
  • [23] ArabicaQA: A Comprehensive Dataset for Arabic Question Answering
    Abdallah, Abdelrahman
    Kasem, Mahmoud
    Abdalla, Mahmoud
    Mahmoud, Mohamed
    Elkasaby, Mohamed
    Elbendary, Yasser
    Jatowt, Adam
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2049 - 2059
  • [24] VQuAD: Video Question Answering Diagnostic Dataset
    Gupta, Vivek
    Patro, Badri N.
    Parihar, Hemant
    Namboodiri, Vinay P.
    2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 282 - 291
  • [25] TutorialVQA: Question Answering Dataset for Tutorial Videos
    Colas, Anthony
    Kim, Seokhwan
    Dernoncourt, Franck
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 5450 - 5455
  • [26] Towards a Polish Question Answering Dataset (PoQuAD)
    Tuora, Ryszard
    Zawadzka-Paluektau, Natalia
    Klamra, Cezary
    Zwierzchowska, Aleksandra
    Kobylinski, Lukasz
    FROM BORN-PHYSICAL TO BORN-VIRTUAL: AUGMENTING INTELLIGENCE IN DIGITAL LIBRARIES, ICADL 2022, 2022, 13636 : 194 - 203
  • [27] PerCQA: Persian Community Question Answering Dataset
    Jamali, Naghme
    Yaghoobzadeh, Yadollah
    Faili, Heshaam
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6083 - 6092
  • [28] PRAGMATICQA: A Dataset for Pragmatic Question Answering in Conversations
    Qi, Peng
    Du, Nina
    Manning, Christopher D.
    Huang, Jing
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 6175 - 6191
  • [29] MemoriQA: A Question-Answering Lifelog Dataset
    Tran, Quang-Linh
    Nguyen, Binh
    Jones, Gareth J. F.
    Gurrin, Cathal
    PROCEEDINGS OF THE FIRST ACM WORKSHOP ON AI-POWERED QUESTION ANSWERING SYSTEMS FOR MULTIMEDIA, AIQAM 2024, 2024, : 7 - 12
  • [30] QED: A Framework and Dataset for Explanations in Question Answering
    Lamm, Matthew
    Palomaki, Jennimaria
    Alberti, Chris
    Andor, Daniel
    Choi, Eunsol
    Soares, Livio Baldini
    Collins, Michael
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2021, 9 : 790 - 806