MedNER: Enhanced Named Entity Recognition in Medical Corpus via Optimized Balanced and Deep Active Learning

被引:0
|
作者
Zhuang, Yan [1 ]
Zhang, Junyan [1 ]
Lu, Ruogu [1 ]
He, Kunlun [1 ]
Li, Xiuxing [2 ]
机构
[1] Chinese Peoples Liberat Army Gen Hosp, Med Big Data Res Ctr, Beijing, Peoples R China
[2] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
Active learning;
D O I
10.1145/3678178
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ever-growing electronic medical corpora provide unprecedented opportunities for researchers to analyze patient conditions and drug effects. Meanwhile, severe challenges emerged in the large-scale electronic medical records process phase. Primarily, emerging words for medical terms, including informal descriptions, are difficult to recognize. Moreover, although deep models can help in entity extraction on medical texts, they require large-scale labels, which are time-intensive to obtain and not always available in the medical domain. However, when encountering a situation where massive unseen concepts appear or labeled data is insufficient, the performance of existing algorithms will suffer an intolerable decline. In this article, we propose a balanced and deep active learning framework for Medical Named Entity Recognition (MedNER) to alleviate the above problems. Specifically, to describe our selection strategy precisely, we first define the uncertainty of a medical sentence as a labeling loss predicted by a loss-prediction module and define diversity as the least text distance between pairs of sentences in a sample batch computed based on word-morpheme embeddings. Furthermore, aiming to make a trade-off between uncertainty and diversity, we formulate a Distinct-K optimization problem to maximize the slightest uncertainty and diversity of chosen sentences. Finally, we propose a threshold-based approximation selection algorithm, Distinct-K Filter, which selects the most beneficial training samples by balancing diversity and uncertainty. Extensive experimental results on real datasets demonstrate that MedNER significantly outperforms existing approaches. CCS Concepts: center dot Computing methodologies -> Information extraction; Additional Key Words and Phrases: Named Entity Recognition; Active Learning; Medical Text Mining
引用
收藏
页数:24
相关论文
共 50 条
  • [21] An imConvNet-based deep learning model for Chinese medical named entity recognition
    Yuchen Zheng
    Zhenggong Han
    Yimin Cai
    Xubo Duan
    Jiangling Sun
    Wei Yang
    Haisong Huang
    BMC Medical Informatics and Decision Making, 22
  • [22] Combining self learning and active learning for Chinese Named Entity Recognition
    Yao L.
    Sun C.
    Wang X.
    Wang X.
    Journal of Software, 2010, 5 (05) : 530 - 537
  • [23] A deep learning method for named entity recognition in bidding document
    Ji, Yunfei
    Tong, Chao
    Liang, Jun
    Yang, Xi
    Zhao, Zheng
    Wang, Xu
    2018 INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SCIENCE AND APPLICATION TECHNOLOGY, 2019, 1168
  • [24] A Hybrid Deep Learning Framework for Bacterial Named Entity Recognition
    Li, Xusheng
    Wang, Xiaoyan
    Zhong, Ran
    Zhong, Duo
    He, Tingting
    Hu, Xiaohua
    Jiang, Xingpeng
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 428 - 433
  • [25] Loss-based Active Learning for Named Entity Recognition
    Linh, Le Thai
    Nguyen, Minh-Tien
    Zuccon, Guido
    Demartini, Gianluca
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [26] Bengali Named Entity Recognition: A survey with deep learning benchmark
    Rifat, Md Jamiur Rahman
    Abujar, Sheikh
    Noori, Sheak Rashed Haider
    Hossain, Syed Akhter
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [27] Military Named Entity Recognition Method Based on Deep Learning
    Wang, Xuefeng
    Yang, Ruopeng
    Lu, Yiwei
    Wu, Qingfeng
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 479 - 483
  • [28] Medical Named Entity Recognition Using Weakly Supervised Learning
    Long-Long Ma
    Jie Yang
    Bo An
    Shuaikang Liu
    Gaijuan Huang
    Cognitive Computation, 2022, 14 : 1068 - 1079
  • [29] Research Progress on Named Entity Recognition in Chinese Deep Learning
    Li, Li
    Xi, Xuefeng
    Sheng, Shengli
    Cui, Zhiming
    Xu, Jiabao
    Computer Engineering and Applications, 2023, 59 (24) : 46 - 69
  • [30] Clustering Based Active Learning for Biomedical Named Entity Recognition
    Han, Xu
    Kwoh, Chee Keong
    Kim, Jung-jae
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 1253 - 1260