Improving chemical entity recognition through h-index based semantic similarity

被引:12
|
作者
Lamurias, Andre [1 ]
Ferreira, Joao D. [1 ]
Couto, Francisco M. [1 ]
机构
[1] Univ Lisbon, LaSIGE, Dept Informat, Fac Ciencias, P-1749016 Lisbon, Portugal
来源
关键词
CHEMDNER; DRUGS;
D O I
10.1186/1758-2946-7-S1-S13
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is that the chemical entities mentioned in the same fragment of text should share some semantic relation. This validation method was further improved by adapting the semantic similarity measure to take into account the h-index of each ancestor. We applied this method in two measures, simUI and simGIC, and validated the results obtained for the competition, comparing each adapted measure to its original version. Results: For the competition, we trained a Random Forest classifier that uses various scores provided by our system, including semantic similarity, which improved the F-measure obtained with the Conditional Random Fields classifiers by 4.6%. Using a notion of concept relevance based on the h-index measure, we were able to enhance our validation process so that for a fixed recall, we increased precision by excluding from the results a higher amount of false positives. We plotted precision and recall values for a range of validation thresholds using different similarity measures, obtaining higher precision values for the same recall with the measures based on the h-index. Conclusions: The semantic similarity measure we introduced was more efficient at validating text mining results from machine learning classifiers than other measures. We improved the results we obtained for the CHEMDNER task by maintaining high precision values while improving the recall and F-measure.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] Improving Chemical Named Entity Recognition in Patents with Contextualized Word Embeddings
    Zhai, Zenan
    Dat Quoc Nguyen
    Akhondi, Saber A.
    Thorne, Camilo
    Druckenbrodt, Christian
    Cohn, Trevor
    Gregory, Michelle
    Verspoor, Karin
    SIGBIOMED WORKSHOP ON BIOMEDICAL NATURAL LANGUAGE PROCESSING (BIONLP 2019), 2019, : 328 - 338
  • [42] Agent-based model for the h-index - exact solution
    Zogala-Siudem, Barbara
    Siudem, Grzegorz
    Cena, Anna
    Gagolewski, Marek
    EUROPEAN PHYSICAL JOURNAL B, 2016, 89 (01): : 1 - 9
  • [43] Correlation between h-index, Eigenfactor™ and Article Influence™ of chemical engineering journals
    Prathap, Gangan
    CURRENT SCIENCE, 2011, 100 (09): : 1276 - 1276
  • [44] Semantic similarity based food entities recognition using WordNet
    Butt, Sahrish
    Bakhtyar, Maheen
    Noor, Waheed
    Baber, Junaid
    Ullah, Ihsan
    Ahmed, Atiq
    Basit, Abdul
    Kakar, M. Saeed H.
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (02) : 2069 - 2078
  • [45] Cp-index: a PageRank and H-index based centrality measure for collaboration network
    College of Continuous Education, University of Electronic Science and Technology of China, Chengdu, China
    J. Comput. Inf. Syst., 20 (7573-7586):
  • [46] Chinese Chemical Named Entity Recognition Based on Morpheme
    Wang, Guirong
    Xia, Bo
    Xiao, Ye
    Rao, Gaoqi
    Xun, Endong
    2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 247 - 252
  • [47] Improving semantic similarity computation via subgraph feature fusion based on semantic awareness
    Deng, Yuanfei
    Bai, Wen
    Li, Jiawei
    Mao, Shun
    Jiang, Yuncheng
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 136
  • [48] Complex Entity Recognition Based on Prior Semantic Knowledge and Type Embedding
    Jiang X.-B.
    He K.
    Yan G.-Y.
    Ruan Jian Xue Bao/Journal of Software, 2023, 34 (12): : 5649 - 5669
  • [49] Quantitative analysis of automatic performance evaluation systems based on the h-index
    Hauer, Marc P.
    Hofmann, Xavier C. R.
    Krafft, Tobias D.
    Zweig, Katharina A.
    SCIENTOMETRICS, 2020, 123 (02) : 735 - 751
  • [50] Improving and Simplifying Template-Based Named Entity Recognition
    Kondragunta, Murali
    Perez-de-Vinaspre, Olatz
    Oronoz, Maite
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 79 - 86