Improving chemical entity recognition through h-index based semantic similarity

被引:12
|
作者
Lamurias, Andre [1 ]
Ferreira, Joao D. [1 ]
Couto, Francisco M. [1 ]
机构
[1] Univ Lisbon, LaSIGE, Dept Informat, Fac Ciencias, P-1749016 Lisbon, Portugal
来源
关键词
CHEMDNER; DRUGS;
D O I
10.1186/1758-2946-7-S1-S13
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Background: Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is that the chemical entities mentioned in the same fragment of text should share some semantic relation. This validation method was further improved by adapting the semantic similarity measure to take into account the h-index of each ancestor. We applied this method in two measures, simUI and simGIC, and validated the results obtained for the competition, comparing each adapted measure to its original version. Results: For the competition, we trained a Random Forest classifier that uses various scores provided by our system, including semantic similarity, which improved the F-measure obtained with the Conditional Random Fields classifiers by 4.6%. Using a notion of concept relevance based on the h-index measure, we were able to enhance our validation process so that for a fixed recall, we increased precision by excluding from the results a higher amount of false positives. We plotted precision and recall values for a range of validation thresholds using different similarity measures, obtaining higher precision values for the same recall with the measures based on the h-index. Conclusions: The semantic similarity measure we introduced was more efficient at validating text mining results from machine learning classifiers than other measures. We improved the results we obtained for the CHEMDNER task by maintaining high precision values while improving the recall and F-measure.
引用
收藏
页数:9
相关论文
共 50 条
  • [21] A comparison between the g-index and the h-index based on concentration
    Bartolucci, Francesco
    JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2015, 66 (12) : 2708 - 2710
  • [22] Computing the Influence of Disciplinary Keywords Based on h-IndeX
    Shen, Si
    Wu, Peng
    Wang, Dongbo
    16TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI 2017), 2017, : 1678 - 1679
  • [23] An agent-based model for the bibliometric h-index
    Georgia Ionescu
    Bastien Chopard
    The European Physical Journal B, 2013, 86
  • [24] H-Classics: characterizing the concept of citation classics through H-index
    Martinez, M. A.
    Herrera, M.
    Lopez-Gijon, J.
    Herrera-Viedma, E.
    SCIENTOMETRICS, 2014, 98 (03) : 1971 - 1983
  • [25] H-Classics: characterizing the concept of citation classics through H-index
    M. A. Martínez
    M. Herrera
    J. López-Gijón
    E. Herrera-Viedma
    Scientometrics, 2014, 98 : 1971 - 1983
  • [26] An agent-based model for the bibliometric h-index
    Ionescu, Georgia
    Chopard, Bastien
    EUROPEAN PHYSICAL JOURNAL B, 2013, 86 (10):
  • [27] A Complement to the H-Index: A Metric Based on Primary Authorship
    Dasgupta, Pushan
    Taegtmeyer, Heinrich
    AMERICAN JOURNAL OF MEDICINE, 2023, 136 (12): : 1139 - 1140
  • [28] The Hl -index: improvement of H-index based on quality of citing papers
    Zhai, Li
    Yan, Xiangbin
    Zhu, Bin
    SCIENTOMETRICS, 2014, 98 (02) : 1021 - 1031
  • [29] GARUM: A Semantic Similarity Measure Based on Machine Learning and Entity Characteristics
    Traverso-Ribon, Ignacio
    Vidal, Maria-Esther
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2018, PT I, 2018, 11029 : 169 - 183
  • [30] Named Entity Recognition and Linking in Tweets Based on Linguistic Similarity
    Pipitone, Arianna
    Tirone, Giuseppe
    Pirrone, Roberto
    AI*IA 2017 ADVANCES IN ARTIFICIAL INTELLIGENCE, 2017, 10640 : 101 - 113