Combining word embeddings to extract chemical and drug entities in biomedical literature

被引:1
|
作者
Lopez-Ubeda, Pilar [1 ]
Diaz-Galiano, Manuel Carlos [1 ]
Urena-Lopez, L. Alfonso [1 ]
Martin-Valdivia, M. Teresa [1 ]
机构
[1] Univ Jaen, Dept Comp Sci, Adv Studies Ctr Informat & Commun Technol CEATIC, Campus Lagunillas S-N, Jaen 23071, Spain
关键词
Natural language processing; Named entity recognition; Concept indexing; Neural network; Word embeddings; SNOMED-CT; SYSTEM;
D O I
10.1186/s12859-021-04188-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Natural language processing (NLP) and text mining technologies for the extraction and indexing of chemical and drug entities are key to improving the access and integration of information from unstructured data such as biomedical literature. Methods In this paper we evaluate two important tasks in NLP: the named entity recognition (NER) and Entity indexing using the SNOMED-CT terminology. For this purpose, we propose a combination of word embeddings in order to improve the results obtained in the PharmaCoNER challenge. Results For the NER task we present a neural network composed of BiLSTM with a CRF sequential layer where different word embeddings are combined as an input to the architecture. A hybrid method combining supervised and unsupervised models is used for the concept indexing task. In the supervised model, we use the training set to find previously trained concepts, and the unsupervised model is based on a 6-step architecture. This architecture uses a dictionary of synonyms and the Levenshtein distance to assign the correct SNOMED-CT code. Conclusion On the one hand, the combination of word embeddings helps to improve the recognition of chemicals and drugs in the biomedical literature. We achieved results of 91.41% for precision, 90.14% for recall, and 90.77% for F1-score using micro-averaging. On the other hand, our indexing system achieves a 92.67% F1-score, 92.44% for recall, and 92.91% for precision. With these results in a final ranking, we would be in the first position.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Combining word embeddings to extract chemical and drug entities in biomedical literature
    Pilar López-Úbeda
    Manuel Carlos Díaz-Galiano
    L. Alfonso Ureña-López
    M. Teresa Martín-Valdivia
    BMC Bioinformatics, 22
  • [2] Biomedical entities recognition in Spanish combining word embeddings
    Lopez-Ubeda, Pilar
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2022, (68): : 149 - 152
  • [3] Thesaurus-based word embeddings for automated biomedical literature classification
    Dimitrios A. Koutsomitropoulos
    Andreas D. Andriopoulos
    Neural Computing and Applications, 2022, 34 : 937 - 950
  • [4] Thesaurus-based word embeddings for automated biomedical literature classification
    Koutsomitropoulos, Dimitrios A.
    Andriopoulos, Andreas D.
    NEURAL COMPUTING & APPLICATIONS, 2022, 34 (02): : 937 - 950
  • [5] Biomedical Word Sense Disambiguation with Word Embeddings
    Antunes, Rui
    Matos, Sergio
    11TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS, 2017, 616 : 273 - 279
  • [6] CheNER: a tool for the identification of chemical entities and their classes in biomedical literature
    Anabel Usié
    Joaquim Cruz
    Jorge Comas
    Francesc Solsona
    Rui Alves
    Journal of Cheminformatics, 7
  • [7] CheNER: a tool for the identification of chemical entities and their classes in biomedical literature
    Usie, Anabel
    Cruz, Joaquim
    Comas, Jorge
    Solsona, Francesc
    Alves, Rui
    JOURNAL OF CHEMINFORMATICS, 2015, 7
  • [8] Dependency and AMR Embeddings for Drug-Drug Interaction Extraction from Biomedical Literature
    Wang, Yanshan
    Liu, Sijia
    Rastegar-Mojarad, Majid
    Wang, Liwei
    Shen, Feichen
    Liu, Fei
    Liu, Hongfang
    ACM-BCB' 2017: PROCEEDINGS OF THE 8TH ACM INTERNATIONAL CONFERENCE ON BIOINFORMATICS, COMPUTATIONAL BIOLOGY,AND HEALTH INFORMATICS, 2017, : 36 - 43
  • [9] Improved biomedical word embeddings in the transformer era
    Noh, Jiho
    Kavuluru, Ramakanth
    JOURNAL OF BIOMEDICAL INFORMATICS, 2021, 120 (120)
  • [10] Biomedical Semantic Embeddings: Using hybrid sentences to construct biomedical word embeddings and its applications
    Shaik, Arshad
    Jin, Wei
    2019 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2019,