Hybrid medical named entity recognition using document structure and surrounding context

被引:0
|
作者
Mohamed Yassine Landolsi
Lotfi Ben Romdhane
Lobna Hlaoua
机构
[1] ISITCom,MARS Research Lab LR17ES05, SDM Research Group
[2] University of Sousse,undefined
来源
关键词
Medical text mining; Named entity recognition; Machine learning; Information extraction; Electronic medical records; Section identification;
D O I
暂无
中图分类号
学科分类号
摘要
Nowadays, there is a huge amount of electronic medical documents created in natural language by medical specialists, containing useful information needed for several medical tasks. However, reading these documents to get some specific information is a too tiring task. Thus, extracting information automatically became an essential and a challenging task, especially Named Entity Recognition (NER). NER is crucial for extracting valuable information used in various medical tasks such as clinical decision support and drug safety surveillance. Capturing sufficient context is important for an efficient NER. In the literature, some important context information are not well exploited. Usually, a standard sequence segmentation is used, such as sentence segmentation, which may can’t cover sufficient context. In this paper, we propose a supervised NER method, called MedSINE (Medical Section Identification to enhance the Named Entity tagging), which is based on sequence tagging task using Bidirectional Long Short-Term Memory neural network with Conditional Random Field (BiLSTM-CRF). For that, we exploit layout information to segment the text on chunk sequences and to extract the parent sections of each word as features to provide sufficient context. In addition, we have used a clinical Bidirectional Encoder Representations from Transformers (BERT) word embedding, Part of Speech (PoS), and entity surrounding sequence features. Experiments were conducted on a manually annotated dataset of real Summary of Product Characteristics (SmPC) medical documents in PDF format and on the Colorado Richly Annotated Full Text (CRAFT) corpus. Our model achieved an F1-measure of 89.49%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$89.49\%$$\end{document} and 73.52%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$73.52\%$$\end{document} in terms of strict matching evaluation using the SmPC and CRAFT datasets, respectively. The results show that employing the sequence of parent sections improves the F1-measure by 4.71%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$4.71\%$$\end{document} in terms of strict matching evaluation.
引用
收藏
页码:5011 / 5041
页数:30
相关论文
共 50 条
  • [21] ImNER Indonesian Medical Named Entity Recognition
    Suwarningsih, Wiwin
    Supriana, Iping
    Purwarianti, Ayu
    2014 2ND INTERNATIONAL CONFERENCE ON TECHNOLOGY, INFORMATICS, MANAGEMENT, ENGINEERING, AND ENVIRONMENT (TIME-E 2014), 2014, : 184 - 188
  • [22] A Hybrid Approach for Persian Named Entity Recognition
    Hamed Moradi
    Farid Ahmadi
    Mohammad-Reza Feizi-Derakhshi
    Iranian Journal of Science and Technology, Transactions A: Science, 2017, 41 : 215 - 222
  • [23] A hybrid model for Chinese named entity recognition
    Sun, Xiao
    Huang, Degen
    RECENT ADVANCE OF CHINESE COMPUTING TECHNOLOGIES, 2007, : 232 - 237
  • [24] A Hybrid Approach for Persian Named Entity Recognition
    Moradi, Hamed
    Ahmadi, Farid
    Feizi-Derakhshi, Mohammad-Reza
    IRANIAN JOURNAL OF SCIENCE AND TECHNOLOGY TRANSACTION A-SCIENCE, 2017, 41 (A1): : 215 - 222
  • [25] Medical Named Entity Recognition with Domain Knowledge
    Pei W.
    Sun S.
    Li X.
    Lu J.
    Yang L.
    Wu Y.
    Data Analysis and Knowledge Discovery, 2023, 7 (03) : 142 - 154
  • [26] A hybrid approach to Arabic named entity recognition
    Shaalan, Khaled
    Oudah, Mai
    JOURNAL OF INFORMATION SCIENCE, 2014, 40 (01) : 67 - 87
  • [27] Named Entity Recognition in Assamese: A Hybrid Approach
    Sharma, Padmaja
    Sharma, Utpal
    Kalita, Jugal
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 2114 - 2120
  • [28] Metabolite Named Entity Recognition: A Hybrid Approach
    Kongburan, Wutthipong
    Padungweang, Praisan
    Krathu, Worarat
    Chan, Jonathan H.
    NEURAL INFORMATION PROCESSING, ICONIP 2016, PT I, 2016, 9947 : 451 - 460
  • [29] Named Entity Recognition in Manipuri: A Hybrid Approach
    Jimmy, L.
    Kaur, Darvinder
    LANGUAGE PROCESSING AND KNOWLEDGE IN THE WEB, 2013, 8105 : 104 - 110
  • [30] AN AUTOMATED SYSTEM FOR TAMIL NAMED ENTITY RECOGNITION USING HYBRID APPROACH
    Jeyashenbagavalli, N.
    Srinivasagan, K. G.
    Suganthi, S.
    2014 INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTING APPLICATIONS (ICICA 2014), 2014, : 435 - 439