Improving chemical entity recognition through h-index based semantic similarity

被引:12
|
作者
Lamurias, Andre [1 ]
Ferreira, Joao D. [1 ]
Couto, Francisco M. [1 ]
机构
[1] Univ Lisbon, LaSIGE, Dept Informat, Fac Ciencias, P-1749016 Lisbon, Portugal
来源
关键词
CHEMDNER; DRUGS;
D O I
10.1186/1758-2946-7-S1-S13
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is that the chemical entities mentioned in the same fragment of text should share some semantic relation. This validation method was further improved by adapting the semantic similarity measure to take into account the h-index of each ancestor. We applied this method in two measures, simUI and simGIC, and validated the results obtained for the competition, comparing each adapted measure to its original version. Results: For the competition, we trained a Random Forest classifier that uses various scores provided by our system, including semantic similarity, which improved the F-measure obtained with the Conditional Random Fields classifiers by 4.6%. Using a notion of concept relevance based on the h-index measure, we were able to enhance our validation process so that for a fixed recall, we increased precision by excluding from the results a higher amount of false positives. We plotted precision and recall values for a range of validation thresholds using different similarity measures, obtaining higher precision values for the same recall with the measures based on the h-index. Conclusions: The semantic similarity measure we introduced was more efficient at validating text mining results from machine learning classifiers than other measures. We improved the results we obtained for the CHEMDNER task by maintaining high precision values while improving the recall and F-measure.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] Improving chemical entity recognition through h-index based semantic similarity
    Andre Lamurias
    João D Ferreira
    Francisco M Couto
    Journal of Cheminformatics, 7
  • [2] The h′-Index, Effectively Improving the h-Index Based on the Citation Distribution
    Zhang, Chun-Ting
    PLOS ONE, 2013, 8 (04):
  • [3] Improving a decomposition of the h-index
    Bertoli-Barsotti, Lucio
    JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2013, 64 (07): : 1522 - 1522
  • [4] Quantifying the Significance of Cybersecurity Text through Semantic Similarity and Named Entity Recognition
    Mendsaikhan, Otgonpurev
    Hasegawa, Hirokazu
    Yukiko, Yamaguchi
    Shimada, Hajime
    ICISSP: PROCEEDINGS OF THE 6TH INTERNATIONAL CONFERENCE ON INFORMATION SYSTEMS SECURITY AND PRIVACY, 2020, : 325 - 332
  • [5] Improving Fairness of H-index: RA-index
    Rochim, Adian Fatchur
    Muis, Abdul
    Sari, Riri Fitri
    DESIDOC JOURNAL OF LIBRARY & INFORMATION TECHNOLOGY, 2018, 38 (06): : 378 - 386
  • [6] On the use of the h-index in evaluating chemical research
    Rosaria Ciriminna
    Mario Pagliaro
    Chemistry Central Journal, 7
  • [7] On the use of the h-index in evaluating chemical research
    Ciriminna, Rosaria
    Pagliaro, Mario
    CHEMISTRY CENTRAL JOURNAL, 2013, 7
  • [8] Bibliometric indicator based on the h-index
    Dorta-Gonzalez, Pablo
    Isabel Dorta-Gonzalez, Maria
    REVISTA ESPANOLA DE DOCUMENTACION CIENTIFICA, 2010, 33 (02): : 225 - 245
  • [9] Collective Entity Disambiguation Based on Hierarchical Semantic Similarity
    Jia, Bingjing
    Yang, Hu
    Wu, Bin
    Xing, Ying
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2020, 16 (02) : 1 - 17
  • [10] Enhancement of Chemical Entity Identification in Text Using Semantic Similarity Validation
    Grego, Tiago
    Couto, Francisco M.
    PLOS ONE, 2013, 8 (05):