Improving chemical entity recognition through h-index based semantic similarity

被引：12

作者：

Lamurias, Andre ^{[1
]}

Ferreira, Joao D. ^{[1
]}

Couto, Francisco M. ^{[1
]}

机构：

[1] Univ Lisbon, LaSIGE, Dept Informat, Fac Ciencias, P-1749016 Lisbon, Portugal

来源：

JOURNAL OF CHEMINFORMATICS | 2015年 / 7卷

关键词：

CHEMDNER; DRUGS;

D O I：

10.1186/1758-2946-7-S1-S13

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Background: Our approach to the BioCreative IV challenge of recognition and classification of drug names (CHEMDNER task) aimed at achieving high levels of precision by applying semantic similarity validation techniques to Chemical Entities of Biological Interest (ChEBI) mappings. Our assumption is that the chemical entities mentioned in the same fragment of text should share some semantic relation. This validation method was further improved by adapting the semantic similarity measure to take into account the h-index of each ancestor. We applied this method in two measures, simUI and simGIC, and validated the results obtained for the competition, comparing each adapted measure to its original version. Results: For the competition, we trained a Random Forest classifier that uses various scores provided by our system, including semantic similarity, which improved the F-measure obtained with the Conditional Random Fields classifiers by 4.6%. Using a notion of concept relevance based on the h-index measure, we were able to enhance our validation process so that for a fixed recall, we increased precision by excluding from the results a higher amount of false positives. We plotted precision and recall values for a range of validation thresholds using different similarity measures, obtaining higher precision values for the same recall with the measures based on the h-index. Conclusions: The semantic similarity measure we introduced was more efficient at validating text mining results from machine learning classifiers than other measures. We improved the results we obtained for the CHEMDNER task by maintaining high precision values while improving the recall and F-measure.

引用

页数：9

共 50 条

[21] A comparison between the g-index and the h-index based on concentration
Bartolucci, Francesco
JOURNAL OF THE ASSOCIATION FOR INFORMATION SCIENCE AND TECHNOLOGY, 2015, 66 (12) : 2708 - 2710
[22] Computing the Influence of Disciplinary Keywords Based on h-IndeX
Shen, Si
Wu, Peng
Wang, Dongbo
16TH INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS (ISSI 2017), 2017, : 1678 - 1679
[23] An agent-based model for the bibliometric h-index
Georgia Ionescu
Bastien Chopard
The European Physical Journal B, 2013, 86
[24] H-Classics: characterizing the concept of citation classics through H-index
Martinez, M. A.
Herrera, M.
Lopez-Gijon, J.
Herrera-Viedma, E.
SCIENTOMETRICS, 2014, 98 (03) : 1971 - 1983
[25] H-Classics: characterizing the concept of citation classics through H-index
M. A. Martínez
M. Herrera
J. López-Gijón
E. Herrera-Viedma
Scientometrics, 2014, 98 : 1971 - 1983
[26] An agent-based model for the bibliometric h-index
Ionescu, Georgia
Chopard, Bastien
EUROPEAN PHYSICAL JOURNAL B, 2013, 86 (10):
[27] A Complement to the H-Index: A Metric Based on Primary Authorship
Dasgupta, Pushan
Taegtmeyer, Heinrich
AMERICAN JOURNAL OF MEDICINE, 2023, 136 (12): : 1139 - 1140
[28] The Hl -index: improvement of H-index based on quality of citing papers
Zhai, Li
Yan, Xiangbin
Zhu, Bin
SCIENTOMETRICS, 2014, 98 (02) : 1021 - 1031
[29] GARUM: A Semantic Similarity Measure Based on Machine Learning and Entity Characteristics
Traverso-Ribon, Ignacio
Vidal, Maria-Esther
DATABASE AND EXPERT SYSTEMS APPLICATIONS, DEXA 2018, PT I, 2018, 11029 : 169 - 183
[30] Named Entity Recognition and Linking in Tweets Based on Linguistic Similarity
Pipitone, Arianna
Tirone, Giuseppe
Pirrone, Roberto
AI*IA 2017 ADVANCES IN ARTIFICIAL INTELLIGENCE, 2017, 10640 : 101 - 113

← 1 2 3 4 5 →