Comparative analysis of five protein-protein interaction corpora

被引:113
|
作者
Pyysalo, Sampo [1 ]
Airola, Antti
Heimonen, Juho
Bjorne, Jari
Ginter, Filip
Salakoski, Tapio
机构
[1] Univ Turku, TUCS, FIN-20520 Turku, Finland
关键词
PubMed Abstract; Entity Annotation; Entity Pair; Corpus Annotation; Annotate Entity;
D O I
10.1186/1471-2105-9-S3-S6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Growing interest in the application of natural language processing methods to biomedical text has led to an increasing number of corpora and methods targeting protein-protein interaction (PPI) extraction. However, there is no general consensus regarding PPI annotation and consequently resources are largely incompatible and methods are difficult to evaluate. Results: We present the first comparative evaluation of the diverse PPI corpora, performing quantitative evaluation using two separate information extraction methods as well as detailed statistical and qualitative analyses of their properties. For the evaluation, we unify the corpus PPI annotations to a shared level of information, consisting of undirected, untyped binary interactions of non-static types with no identification of the words specifying the interaction, no negations, and no interaction certainty. We find that the F-score performance of a state-of-the-art PPI extraction method varies on average 19 percentage units and in some cases over 30 percentage units between the different evaluated corpora. The differences stemming from the choice of corpus can thus be substantially larger than differences between the performance of PPI extraction methods, which suggests definite limits on the ability to compare methods evaluated on different resources. We analyse a number of potential sources for these differences and identify factors explaining approximately half of the variance. We further suggest ways in which the difficulty of the PPI extraction tasks codified by different corpora can be determined to advance comparability. Our analysis also identifies points of agreement and disagreement in PPI corpus annotation that are rarely explicitly stated by the authors of the corpora. Conclusions: Our comparative analysis uncovers key similarities and differences between the diverse PPI corpora, thus taking an important step towards standardization. In the course of this study we have created a major practical contribution in converting the corpora into a shared format. The conversion software is freely available at http://mars.cs.utu.fi/PPICorpora.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] A comparative study of cancer proteins in the human protein-protein interaction network
    Sun, Jingchun
    Zhao, Zhongming
    BMC GENOMICS, 2010, 11
  • [32] Hydration and protein-protein interaction
    Janin, J
    HYDRATION PROCESSES IN BIOLOGY: THEORETICAL AND EXPERIMENTAL APPROACHES, 1999, 305 : 159 - 173
  • [33] Graph Neural Network for Protein-Protein Interaction Prediction: A Comparative Study
    Zhou, Hang
    Wang, Weikun
    Jin, Jiayun
    Zheng, Zengwei
    Zhou, Binbin
    MOLECULES, 2022, 27 (18):
  • [34] A comparative study of cancer proteins in the human protein-protein interaction network
    Jingchun Sun
    Zhongming Zhao
    BMC Genomics, 11
  • [35] PROTEIN-PROTEIN INTERACTION IN TRANSPORT
    AMES, GF
    JOURNAL OF CELLULAR PHYSIOLOGY, 1976, 89 (04) : 543 - 543
  • [36] Protein-protein interaction probes
    Choi, C
    SCIENTIST, 2004, 18 (24): : 31 - 31
  • [37] On Combinatorial Optimisation in Analysis of Protein-Protein Interaction and Protein Folding Networks
    Chalupa, David
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2016, PT I, 2016, 9597 : 91 - 105
  • [38] PocketQuery: protein-protein interaction inhibitor starting points from protein-protein interaction structure
    Koes, David Ryan
    Camacho, Carlos J.
    NUCLEIC ACIDS RESEARCH, 2012, 40 (W1) : W387 - W392
  • [39] Protein-protein interaction analysis to identify biomarker networks for endometriosis
    Xiao, Hong
    Yang, Lihua
    Liu, Jianjun
    Jiao, Yang
    Lu, Lin
    Zhao, Hongbo
    EXPERIMENTAL AND THERAPEUTIC MEDICINE, 2017, 14 (05) : 4647 - 4654
  • [40] Tools for protein-protein interaction network analysis in cancer research
    Rebeca Sanz-Pamplona
    Antoni Berenguer
    Xavier Sole
    David Cordero
    Marta Crous-Bou
    Jordi Serra-Musach
    Elisabet Guinó
    Miguel Ángel Pujana
    Víctor Moreno
    Clinical and Translational Oncology, 2012, 14 : 3 - 14