NERvous About My Health: Constructing a Bengali Medical Named Entity Recognition Dataset

被引:0
|
作者
Khan, Alvi Aveen [1 ]
Kamal, Fida [1 ]
Nower, Nuzhat [1 ]
Ahmed, Tasnim [1 ,2 ]
Ahmed, Sabbir [1 ]
Chowdhury, Tareque Mohmud [1 ]
机构
[1] Islamic Univ Technol, Dhaka, Bangladesh
[2] Queens Univ, Kingston, ON, Canada
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The ability to identify important entities in a text, known as Named Entity Recognition (NER), is useful in a large variety of downstream tasks in the biomedical domain. This is a considerably difficult task when working with Consumer Health Questions (CHQs), which consist of informal language used in day-to-day life by patients. These difficulties are amplified in the case of Bengali, which allows for a huge amount of flexibility in sentence structures and has significant variances in regional dialects. Unfortunately, the complexity of the language is not accurately reflected in the limited amount of available data, which makes it difficult to build a reliable decision-making system. To address the scarcity of data, this paper presents 'Bangla-HealthNER', a comprehensive dataset designed to identify named entities in healthrelated texts in the Bengali language. It consists of 31,783 samples sourced from a popular online public health platform, which allows it to capture the diverse range of linguistic styles and dialects used by native speakers from various regions in their day-to-day lives. The insight into this diversity in language will prove useful to any medical decision-making systems that are developed for use in real-world applications. To highlight the difficulty of the dataset, it has been benchmarked on state-of-the-art token classification models, where BanglishBERT achieved the highest performance with an F1-score of 56.13 +/- 0.75%. The dataset and all relevant code used in this work have been made publicly available(1).
引用
收藏
页码:5768 / 5774
页数:7
相关论文
共 50 条
  • [1] Named Entity Recognition and transliteration in Bengali
    Ekbal, Asif
    Naskar, Sudip Kumar
    Bandyopadhyay, Sivaji
    LINGUISTICAE INVESTIGATIONES, 2007, 30 (01): : 95 - 114
  • [2] ViMedNER: A Medical Named Entity Recognition Dataset for Vietnamese
    Duong, Pham Van
    Trinh, Tien-Dat
    Nguyen, Minh-Tien
    Vu, Huy-The
    Pham, Minh-Chuan
    Tuan, Tran Manh
    Son, Le Hoang
    EAI Endorsed Transactions on Industrial Networks and Intelligent Systems, 2024, 11 (04)
  • [3] A Named Entity Recognition Dataset for Turkish
    Kucuk, Dilek
    Kucuk, Dogan
    Arici, Nursal
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 329 - 332
  • [4] CachacaNER: a dataset for named entity recognition in texts about the cachaca beverage
    Silva, Priscilla
    Franco, Arthur
    Santos, Thiago
    Brito, Mozar
    Pereira, Denilson
    LANGUAGE RESOURCES AND EVALUATION, 2024, 58 (04) : 1315 - 1333
  • [5] Three different models for named entity recognition in Bengali
    Ekbal, Asif
    PROGRESS IN PATTERN RECOGNITION, 2007, : 161 - 170
  • [6] Named entity recognition in Bengali using system combination
    Ekbal, Asif
    Bandyopadhyay, Sivaji
    LINGUISTICAE INVESTIGATIONES, 2014, 37 (01): : 1 - 22
  • [7] Bengali Named Entity Recognition using Classifier Combination
    Ekbal, Asif
    Bandyopadhyay, Sivaji
    ICAPR 2009: SEVENTH INTERNATIONAL CONFERENCE ON ADVANCES IN PATTERN RECOGNITION, PROCEEDINGS, 2009, : 259 - 262
  • [8] KazNERD: Kazakh Named Entity Recognition Dataset
    Yeshpanov, Rustem
    Khassanov, Yerbolat
    Varol, Huseyin Atakan
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 417 - 426
  • [9] DroNER: Dataset for drone named entity recognition
    Silalahi, Swardiantara
    Ahmad, Tohari
    Studiawan, Hudan
    DATA IN BRIEF, 2023, 48
  • [10] Bengali Named Entity Recognition: A survey with deep learning benchmark
    Rifat, Md Jamiur Rahman
    Abujar, Sheikh
    Noori, Sheak Rashed Haider
    Hossain, Syed Akhter
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,