Calculating semantic relatedness for biomedical use in a knowledge-poor environment

被引:1
|
作者
Rybinski, Maciej [1 ]
Francisco Aldana-Montes, Jose [1 ]
机构
[1] Univ Malaga, Dept LCC, Malaga 29010, Spain
来源
BMC BIOINFORMATICS | 2014年 / 15卷
关键词
SIMILARITY; DOMAIN;
D O I
10.1186/1471-2105-15-S14-S2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Computing semantic relatedness between textual labels representing biological and medical concepts is a crucial task in many automated knowledge extraction and processing applications relevant to the biomedical domain, specifically due to the huge amount of new findings being published each year. Most methods benefit from making use of highly specific resources, thus reducing their usability in many real world scenarios that differ from the original assumptions. In this paper we present a simple resource-efficient method for calculating semantic relatedness in a knowledge-poor environment. The method obtains results comparable to state-of-the-art methods, while being more generic and flexible. The solution being presented here was designed to use only a relatively generic and small document corpus and its statistics, without referring to a previously defined knowledge base, thus it does not assume a 'closed' problem. Results: We propose a method in which computation for two input texts is based on the idea of comparing the vocabulary associated with the best-fit documents related to those texts. As keyterm extraction is a costly process, it is done in a preprocessing step on a 'per-document' basis in order to limit the on-line processing. The actual computations are executed in a compact vector space, limited by the most informative extraction results. The method has been evaluated on five direct benchmarks by calculating correlation coefficients w.r.t. average human answers. It also has been used on Gene - Disease and Disease-Disease data pairs to highlight its potential use as a data analysis tool. Apart from comparisons with reported results, some interesting features of the method have been studied, i.e. the relationship between result quality, efficiency and applicable trimming threshold for size reduction. Experimental evaluation shows that the presented method obtains results that are comparable with current state of the art methods, even surpassing them on a majority of the benchmarks. Additionally, a possible usage scenario for the method is showcased with a real-world data experiment. Conclusions: Our method improves flexibility of the existing methods without a notable loss of quality. It is a legitimate alternative to the costly construction of specialized knowledge-rich resources.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Assertion Detection in Clinical Natural Language Processing: A Knowledge-Poor Machine Learning Approach
    Chen, Long
    2019 IEEE 2ND INTERNATIONAL CONFERENCE ON INFORMATION AND COMPUTER TECHNOLOGIES (ICICT), 2019, : 37 - 40
  • [22] Calculating semantic relatedness of lists of nouns using WordNet path length
    Ensor, Tyler M.
    MacMillan, Molly B.
    Neath, Ian
    Surprenant, Aimee M.
    BEHAVIOR RESEARCH METHODS, 2021, 53 (06) : 2430 - 2438
  • [23] Calculating semantic relatedness of lists of nouns using WordNet path length
    Tyler M. Ensor
    Molly B. MacMillan
    Ian Neath
    Aimée M. Surprenant
    Behavior Research Methods, 2021, 53 : 2430 - 2438
  • [24] Knowledge derived from Wikipedia for computing semantic relatedness
    Ponzetto, Simone Paolo
    Strube, Michael
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2007, 30 (181-212): : 181 - 212
  • [25] Association measures for estimating semantic similarity and relatedness between biomedical concepts
    Henry, Sam
    McQuilkin, Alex
    McInnes, Bridget T.
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2019, 93 : 1 - 10
  • [26] Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text
    McInnes, Bridget T.
    Pedersen, Ted
    JOURNAL OF BIOMEDICAL INFORMATICS, 2013, 46 (06) : 1116 - 1124
  • [27] The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies
    Harispe, Sebastien
    Ranwez, Sylvie
    Janaqi, Stefan
    Montmain, Jacky
    BIOINFORMATICS, 2014, 30 (05) : 740 - 742
  • [28] Adapting Gloss Vector Semantic Relatedness Measure for Semantic Similarity Estimation: An Evaluation in the Biomedical Domain
    Pesaranghader, Ahmad
    Rezaei, Azadeh
    Pesaranghader, Ali
    SEMANTIC TECHNOLOGY, 2014, 8388 : 129 - 145
  • [29] A method for exploring implicit concept relatedness in biomedical knowledge network
    Bai, Tian
    Gong, Leiguang
    Wang, Ye
    Wang, Yan
    Kulikowski, Casimir A.
    Huang, Lan
    BMC BIOINFORMATICS, 2016, 17
  • [30] A method for exploring implicit concept relatedness in biomedical knowledge network
    Tian Bai
    Leiguang Gong
    Ye Wang
    Yan Wang
    Casimir A. Kulikowski
    Lan Huang
    BMC Bioinformatics, 17