Combining word embeddings to extract chemical and drug entities in biomedical literature

被引:1
|
作者
Lopez-Ubeda, Pilar [1 ]
Diaz-Galiano, Manuel Carlos [1 ]
Urena-Lopez, L. Alfonso [1 ]
Martin-Valdivia, M. Teresa [1 ]
机构
[1] Univ Jaen, Dept Comp Sci, Adv Studies Ctr Informat & Commun Technol CEATIC, Campus Lagunillas S-N, Jaen 23071, Spain
关键词
Natural language processing; Named entity recognition; Concept indexing; Neural network; Word embeddings; SNOMED-CT; SYSTEM;
D O I
10.1186/s12859-021-04188-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Natural language processing (NLP) and text mining technologies for the extraction and indexing of chemical and drug entities are key to improving the access and integration of information from unstructured data such as biomedical literature. Methods In this paper we evaluate two important tasks in NLP: the named entity recognition (NER) and Entity indexing using the SNOMED-CT terminology. For this purpose, we propose a combination of word embeddings in order to improve the results obtained in the PharmaCoNER challenge. Results For the NER task we present a neural network composed of BiLSTM with a CRF sequential layer where different word embeddings are combined as an input to the architecture. A hybrid method combining supervised and unsupervised models is used for the concept indexing task. In the supervised model, we use the training set to find previously trained concepts, and the unsupervised model is based on a 6-step architecture. This architecture uses a dictionary of synonyms and the Levenshtein distance to assign the correct SNOMED-CT code. Conclusion On the one hand, the combination of word embeddings helps to improve the recognition of chemicals and drugs in the biomedical literature. We achieved results of 91.41% for precision, 90.14% for recall, and 90.77% for F1-score using micro-averaging. On the other hand, our indexing system achieves a 92.67% F1-score, 92.44% for recall, and 92.91% for precision. With these results in a final ranking, we would be in the first position.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Jointly Extract Entities and Their Relations From Biomedical Text
    Chen, Jizhi
    Gu, Junzhong
    IEEE ACCESS, 2019, 7 : 162818 - 162827
  • [22] Combining Word Embeddings for Portuguese Named Entity Recognition
    da Silva, Messias Gomes
    Alves de Oliveira, Hilario Tomaz
    COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANGUAGE, PROPOR 2022, 2022, 13208 : 198 - 208
  • [23] Combining Local and Global Word Embeddings for Microblog Stemming
    Roy, Anurag
    Ghorai, Trishnendu
    Ghosh, Kripabandhu
    Ghosh, Saptarshi
    CIKM'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2017, : 2267 - 2270
  • [24] Improving Word Embeddings via Combining with Complementary Languages
    Li, Changliang
    Xu, Bo
    Wu, Gaowei
    Zhuang, Tao
    Wang, Xiuying
    Ge, Wendong
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CANADIAN AI 2014, 2014, 8436 : 313 - 318
  • [25] Intelligent multi-document summarization for biomedical literature by word embeddings and graph-based ranking
    Shen, Chen
    Lin, Hongfei
    Hao, Huihui
    Yang, Zhihao
    Wang, Jian
    Zhang, Shaowu
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 37 (04) : 4797 - 4802
  • [26] BioWordVec, improving biomedical word embeddings with subword information and MeSH
    Zhang, Yijia
    Chen, Qingyu
    Yang, Zhihao
    Lin, Hongfei
    Lu, Zhiyong
    SCIENTIFIC DATA, 2019, 6 (1)
  • [27] BioWordVec, improving biomedical word embeddings with subword information and MeSH
    Yijia Zhang
    Qingyu Chen
    Zhihao Yang
    Hongfei Lin
    Zhiyong Lu
    Scientific Data, 6
  • [28] On Using Composite Word Embeddings To Improve Biomedical Term Similarity
    Singh, Abhishek
    Jin, Wei
    2020 IEEE 20TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE 2020), 2020, : 281 - 287
  • [29] Extracting Biomedical Event with Dual Decomposition Integrating Word Embeddings
    Li, Lishuang
    Liu, Shanshan
    Qin, Meiyue
    Wang, Yiwen
    Huang, Degen
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (04) : 669 - 677
  • [30] Combining Knowledge Graph and Word Embeddings for Spherical Topic Modeling
    Ennajari, Hafsa
    Bouguila, Nizar
    Bentahar, Jamal
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (07) : 3609 - 3623