MedNER: Enhanced Named Entity Recognition in Medical Corpus via Optimized Balanced and Deep Active Learning

被引:0
|
作者
Zhuang, Yan [1 ]
Zhang, Junyan [1 ]
Lu, Ruogu [1 ]
He, Kunlun [1 ]
Li, Xiuxing [2 ]
机构
[1] Chinese Peoples Liberat Army Gen Hosp, Med Big Data Res Ctr, Beijing, Peoples R China
[2] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
Active learning;
D O I
10.1145/3678178
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ever-growing electronic medical corpora provide unprecedented opportunities for researchers to analyze patient conditions and drug effects. Meanwhile, severe challenges emerged in the large-scale electronic medical records process phase. Primarily, emerging words for medical terms, including informal descriptions, are difficult to recognize. Moreover, although deep models can help in entity extraction on medical texts, they require large-scale labels, which are time-intensive to obtain and not always available in the medical domain. However, when encountering a situation where massive unseen concepts appear or labeled data is insufficient, the performance of existing algorithms will suffer an intolerable decline. In this article, we propose a balanced and deep active learning framework for Medical Named Entity Recognition (MedNER) to alleviate the above problems. Specifically, to describe our selection strategy precisely, we first define the uncertainty of a medical sentence as a labeling loss predicted by a loss-prediction module and define diversity as the least text distance between pairs of sentences in a sample batch computed based on word-morpheme embeddings. Furthermore, aiming to make a trade-off between uncertainty and diversity, we formulate a Distinct-K optimization problem to maximize the slightest uncertainty and diversity of chosen sentences. Finally, we propose a threshold-based approximation selection algorithm, Distinct-K Filter, which selects the most beneficial training samples by balancing diversity and uncertainty. Extensive experimental results on real datasets demonstrate that MedNER significantly outperforms existing approaches. CCS Concepts: center dot Computing methodologies -> Information extraction; Additional Key Words and Phrases: Named Entity Recognition; Active Learning; Medical Text Mining
引用
收藏
页数:24
相关论文
共 50 条
  • [41] Improving dictionary-based named entity recognition with deep learning
    Nastou, Katerina
    Koutrouli, Mikaela
    Pyysalo, Sampo
    Jensen, Lars Juhl
    BIOINFORMATICS, 2024, 40 : ii45 - ii52
  • [42] ALDANER: Active Learning based Data Augmentation for Named Entity Recognition
    Moscato, Vincenzo
    Postiglione, Marco
    Sperli, Giancarlo
    Vignali, Andrea
    KNOWLEDGE-BASED SYSTEMS, 2024, 305
  • [43] Clinical Named Entity Recognition from Chinese Electronic Medical Records Based on Deep Learning Pretraining
    Gong, Lejun
    Zhang, Zhifei
    Chen, Shiqi
    JOURNAL OF HEALTHCARE ENGINEERING, 2020, 2020
  • [44] Using error decay prediction to overcome practical issues of deep active learning for named entity recognition
    Haw-Shiuan Chang
    Shankar Vembu
    Sunil Mohan
    Rheeya Uppaal
    Andrew McCallum
    Machine Learning, 2020, 109 : 1749 - 1778
  • [45] CRF-based Active Learning for Chinese Named Entity Recognition
    Yao, Lin
    Sun, Chengjie
    Li, Shaofeng
    Wang, Xiaolong
    Wang, Xuan
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 1557 - +
  • [46] Urdu Named Entity Recognition System Using Deep Learning Approaches
    Haq, Rafiul
    Zhang, Xiaowang
    Khan, Wahab
    Feng, Zhiyong
    COMPUTER JOURNAL, 2023, 66 (08): : 1856 - 1869
  • [47] Transfer Learning for Arabic Named Entity Recognition With Deep Neural Networks
    Al-Smadi, Mohammad
    Al-Zboon, Saad
    Jararweh, Yaser
    Juola, Patrick
    IEEE ACCESS, 2020, 8 : 37736 - 37745
  • [48] Deep learning with word embeddings improves biomedical named entity recognition
    Habibi, Maryam
    Weber, Leon
    Neves, Mariana
    Wiegandt, David Luis
    Leser, Ulf
    BIOINFORMATICS, 2017, 33 (14) : I37 - I48
  • [49] A study of active learning methods for named entity recognition in clinical text
    Chen, Yukun
    Lasko, Thomas A.
    Mei, Qiaozhu
    Denny, Joshua C.
    Xu, Hua
    JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 58 : 11 - 18
  • [50] Cross-Lingual Transfer Learning for Medical Named Entity Recognition
    Ding, Pengjie
    Wang, Lei
    Liang, Yaobo
    Lu, Wei
    Li, Linfeng
    Wang, Chun
    Tang, Buzhou
    Yan, Jun
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2020), PT I, 2020, 12112 : 403 - 418