MedNER: Enhanced Named Entity Recognition in Medical Corpus via Optimized Balanced and Deep Active Learning

被引:0
|
作者
Zhuang, Yan [1 ]
Zhang, Junyan [1 ]
Lu, Ruogu [1 ]
He, Kunlun [1 ]
Li, Xiuxing [2 ]
机构
[1] Chinese Peoples Liberat Army Gen Hosp, Med Big Data Res Ctr, Beijing, Peoples R China
[2] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
Active learning;
D O I
10.1145/3678178
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ever-growing electronic medical corpora provide unprecedented opportunities for researchers to analyze patient conditions and drug effects. Meanwhile, severe challenges emerged in the large-scale electronic medical records process phase. Primarily, emerging words for medical terms, including informal descriptions, are difficult to recognize. Moreover, although deep models can help in entity extraction on medical texts, they require large-scale labels, which are time-intensive to obtain and not always available in the medical domain. However, when encountering a situation where massive unseen concepts appear or labeled data is insufficient, the performance of existing algorithms will suffer an intolerable decline. In this article, we propose a balanced and deep active learning framework for Medical Named Entity Recognition (MedNER) to alleviate the above problems. Specifically, to describe our selection strategy precisely, we first define the uncertainty of a medical sentence as a labeling loss predicted by a loss-prediction module and define diversity as the least text distance between pairs of sentences in a sample batch computed based on word-morpheme embeddings. Furthermore, aiming to make a trade-off between uncertainty and diversity, we formulate a Distinct-K optimization problem to maximize the slightest uncertainty and diversity of chosen sentences. Finally, we propose a threshold-based approximation selection algorithm, Distinct-K Filter, which selects the most beneficial training samples by balancing diversity and uncertainty. Extensive experimental results on real datasets demonstrate that MedNER significantly outperforms existing approaches. CCS Concepts: center dot Computing methodologies -> Information extraction; Additional Key Words and Phrases: Named Entity Recognition; Active Learning; Medical Text Mining
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Electronic Medical Record Recommendation System Based on Deep Embedding Learning with Named Entity Recognition
    Zheng, Yuqian
    Yan, Xu
    Cao, Xin
    Ai, Chunhui
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 298 - 309
  • [32] A Variance Based Active Learning Approach for Named Entity Recognition
    Hassanzadeh, Hamed
    Keyvanpour, MohammadReza
    INTELLIGENT COMPUTING AND INFORMATION SCIENCE, PT II, 2011, 135 : 347 - +
  • [33] Named entity recognition using point prediction and active learning
    Kobayashi, Koga
    Wakabayashi, Kei
    IIWAS2019: THE 21ST INTERNATIONAL CONFERENCE ON INFORMATION INTEGRATION AND WEB-BASED APPLICATIONS & SERVICES, 2019, : 287 - 293
  • [34] MedNER: A Service-Oriented Framework for Chinese Medical Named-Entity Recognition with Real-World Application
    Chen, Weisi
    Qiu, Pengxiang
    Cauteruccio, Francesco
    BIG DATA AND COGNITIVE COMPUTING, 2024, 8 (08)
  • [35] A deep learning approach for Named Entity Recognition in Urdu language
    Anam, Rimsha
    Waqas Anwar, Muhammad
    Hasan Jamal, Muhammad
    Ijaz Bajwa, Usama
    de la Torre Diez, Isabel
    Silva Alvarado, Eduardo
    Soriano Flores, Emmanuel
    Ashraf, Imran
    PLOS ONE, 2024, 19 (03):
  • [36] MTAAL: Multi-Task Adversarial Active Learning for Medical Named Entity Recognition and Normalization
    Zhou, Baohang
    Cai, Xiangrui
    Zhang, Ying
    Guo, Wenya
    Yuan, Xiaojie
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 14586 - 14593
  • [37] A Multiclass Classification Method Based on Deep Learning for Named Entity Recognition in Electronic Medical Records
    Dong, Xishuang
    Qian, Lijun
    Guan, Yi
    Huang, Lei
    Yu, Qiubin
    Yang, Jinfeng
    2016 NEW YORK SCIENTIFIC DATA SUMMIT (NYSDS), 2016,
  • [38] Medical Named Entity Recognition Using Weakly Supervised Learning
    Ma, Long-Long
    Yang, Jie
    An, Bo
    Liu, Shuaikang
    Huang, Gaijuan
    COGNITIVE COMPUTATION, 2022, 14 (03) : 1068 - 1079
  • [39] Complex Named Entity Recognition via Deep Multi-task Learning from Scratch
    Chen, Guangyu
    Liu, Tao
    Zhang, Deyuan
    Yu, Bo
    Wang, Baoxun
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT I, 2018, 11108 : 221 - 233
  • [40] Using error decay prediction to overcome practical issues of deep active learning for named entity recognition
    Chang, Haw-Shiuan
    Vembu, Shankar
    Mohan, Sunil
    Uppaal, Rheeya
    McCallum, Andrew
    MACHINE LEARNING, 2020, 109 (9-10) : 1749 - 1778