Effect of Reference Genome Selection on the Performance of Computational Methods for Genome-Wide Protein-Protein Interaction Prediction

被引：18

作者：

Muley, Vijaykumar Yogesh ^{[1
,2
]}

Ranjan, Akash ^{[1
]}

机构：

[1] Ctr DNA Fingerprinting & Diagnost, Computat & Funct Genom Grp, Hyderabad, Andhra Pradesh, India

[2] Dr Babasaheb Ambedkar Marathwada Univ, Dept Biotechnol, Subctr, Osmanabad, Maharashtra, India

来源：

PLOS ONE | 2012年 / 7卷 / 07期

关键词：

ESCHERICHIA-COLI; FUNCTIONAL LINKAGES; PHYLOGENETIC PROFILES; CONTEXT METHODS; GENE ORDER; NETWORKS; DATABASE; COEVOLUTION; EVOLUTION; CONSERVATION;

D O I：

10.1371/journal.pone.0042057

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Background: Recent progress in computational methods for predicting physical and functional protein-protein interactions has provided new insights into the complexity of biological processes. Most of these methods assume that functionally interacting proteins are likely to have a shared evolutionary history. This history can be traced out for the protein pairs of a query genome by correlating different evolutionary aspects of their homologs in multiple genomes known as the reference genomes. These methods include phylogenetic profiling, gene neighborhood and co-occurrence of the orthologous protein coding genes in the same cluster or operon. These are collectively known as genomic context methods. On the other hand a method called mirrortree is based on the similarity of phylogenetic trees between two interacting proteins. Comprehensive performance analyses of these methods have been frequently reported in literature. However, very few studies provide insight into the effect of reference genome selection on detection of meaningful protein interactions. Methods: We analyzed the performance of four methods and their variants to understand the effect of reference genome selection on prediction efficacy. We used six sets of reference genomes, sampled in accordance with phylogenetic diversity and relationship between organisms from 565 bacteria. We used Escherichia coli as a model organism and the gold standard datasets of interacting proteins reported in DIP, EcoCyc and KEGG databases to compare the performance of the prediction methods. Conclusions: Higher performance for predicting protein-protein interactions was achievable even with 100-150 bacterial genomes out of 565 genomes. Inclusion of archaeal genomes in the reference genome set improves performance. We find that in order to obtain a good performance, it is better to sample few genomes of related genera of prokaryotes from the large number of available genomes. Moreover, such a sampling allows for selecting 50-100 genomes for comparable accuracy of predictions when computational resources are limited.

引用

页数：13

共 50 条

[31] Prediction of Protein-Protein Interactions at Genome Scale
Tuncbag, Nurcan
Gursoy, Attila
Nussinov, Ruth
Keskin, Ozlem
BIOPHYSICAL JOURNAL, 2011, 100 (03) : 386 - 386
[32] Recent advances in protein-protein interaction prediction: experimental and computational methods
Jessulat, Matthew
Pitre, Sylvain
Gui, Yuan
Hooshyar, Mohsen
Omidi, Katayoun
Samanfar, Bahram
Tan, Le Hoa
Alamgir, Md
Green, James
Dehne, Frank
Golshani, Ashkan
EXPERT OPINION ON DRUG DISCOVERY, 2011, 6 (09) : 921 - 935
[33] Integrating structural and systems biology: structure-based prediction of protein-protein interactions on a genome-wide scale
Zhang, Qiangfeng Cliff
Petrey, Donald
Honig, Barry
JOURNAL OF BIOMOLECULAR STRUCTURE & DYNAMICS, 2013, 31 : 116 - 116
[34] Persistent homology analysis of type 2 diabetes genome-wide association studies in protein-protein interaction networks
Song, Euijun
FRONTIERS IN GENETICS, 2023, 14
[35] Computational Methods for the Prediction of Protein-Protein Interactions
Xia, Jun-Feng
Wang, Shu-Lin
Lei, Ying-Ke
PROTEIN AND PEPTIDE LETTERS, 2010, 17 (09): : 1069 - 1078
[36] Computational methods for protein-protein interaction and their application
Shi, TL
Li, YX
Cai, YD
Chou, KC
CURRENT PROTEIN & PEPTIDE SCIENCE, 2005, 6 (05) : 443 - 449
[37] Computational Methods for the Prediction of Protein-Protein Interactions
Guerra, Concettina
Mina, Marco
COMBINATORIAL IMAGE ANALYSIS, 2011, 6636 : 13 - 16
[38] Automatic selection of reference taxa for protein-protein interaction prediction with phylogenetic profiling
Simonsen, Martin
Maetschke, Stefan R.
Ragan, Mark A.
BIOINFORMATICS, 2012, 28 (06) : 851 - 857
[39] Structure-based prediction of protein–protein interactions on a genome-wide scale
Qiangfeng Cliff Zhang
Donald Petrey
Lei Deng
Li Qiang
Yu Shi
Chan Aye Thu
Brygida Bisikirska
Celine Lefebvre
Domenico Accili
Tony Hunter
Tom Maniatis
Andrea Califano
Barry Honig
Nature, 2012, 490 : 556 - 560
[40] Minimalist ensemble algorithms for genome-wide protein localization prediction
Lin, Jhih-Rong
Mondal, Ananda Mohan
Liu, Rong
Hu, Jianjun
BMC BIOINFORMATICS, 2012, 13

← 1 2 3 4 5 →