Improving compound-protein interaction prediction by building up highly credible negative samples

被引:214
|
作者
Liu, Hui [1 ,2 ]
Sun, Jianjiang [3 ,4 ]
Guan, Jihong [5 ]
Zheng, Jie [2 ]
Zhou, Shuigeng [3 ,4 ]
机构
[1] Changzhou Univ, Lab Informat Management, Changzhou 213164, Jiangsu, Peoples R China
[2] Nanyang Technol Univ, Sch Comp Engn, Singapore 639798, Singapore
[3] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, Shanghai 200433, Peoples R China
[4] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
[5] Tongji Univ, Dept Comp Sci & Technol, Shanghai 201804, Peoples R China
基金
中国国家自然科学基金;
关键词
DRUG-TARGET INTERACTIONS; INTERACTION NETWORKS; IDENTIFICATION; INTEGRATION; RESOURCE; KERNELS; MODE;
D O I
10.1093/bioinformatics/btv256
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Computational prediction of compound-protein interactions (CPIs) is of great importance for drug design and development, as genome-scale experimental validation of CPIs is not only time-consuming but also prohibitively expensive. With the availability of an increasing number of validated interactions, the performance of computational prediction approaches is severely impended by the lack of reliable negative CPI samples. A systematic method of screening reliable negative sample becomes critical to improving the performance of in silico prediction methods. Results: This article aims at building up a set of highly credible negative samples of CPIs via an in silico screening method. As most existing computational models assume that similar compounds are likely to interact with similar target proteins and achieve remarkable performance, it is rational to identify potential negative samples based on the converse negative proposition that the proteins dissimilar to every known/predicted target of a compound are not much likely to be targeted by the compound and vice versa. We integrated various resources, including chemical structures, chemical expression profiles and side effects of compounds, amino acid sequences, protein-protein interaction network and functional annotations of proteins, into a systematic screening framework. We first tested the screened negative samples on six classical classifiers, and all these classifiers achieved remarkably higher performance on our negative samples than on randomly generated negative samples for both human and Caenorhabditis elegans. We then verified the negative samples on three existing prediction models, including bipartite local model, Gaussian kernel profile and Bayesian matrix factorization, and found that the performances of these models are also significantly improved on the screened negative samples. Moreover, we validated the screened negative samples on a drug bioactivity dataset. Finally, we derived two sets of new interactions by training an support vector machine classifier on the positive interactions annotated in DrugBank and our screened negative interactions. The screened negative samples and the predicted interactions provide the research community with a useful resource for identifying new drug targets and a helpful supplement to the current curated compound-protein databases.
引用
收藏
页码:221 / 229
页数:9
相关论文
共 50 条
  • [21] Pressor mechanism evaluation for phytochemical compounds using in silico compound-protein interaction prediction
    He, Min
    Cao, Dong-Sheng
    Liang, Yi-Zeng
    Li, Ya-Ping
    Liu, Ping-Le
    Xu, Qing-Song
    Huang, Ren-Bin
    REGULATORY TOXICOLOGY AND PHARMACOLOGY, 2013, 67 (01) : 115 - 124
  • [22] Compound-Protein Interaction Prediction Based on Graph Attention Network and Simple Recurrent Unit
    Li, Shuhong
    Jia, Lin
    Moshi Shibie yu Rengong Zhineng/Pattern Recognition and Artificial Intelligence, 2021, 34 (06): : 522 - 531
  • [23] Helix encoder: a compound-protein interaction prediction model specifically designed for class A GPCRs
    Yamane, Haruki
    Ishida, Takashi
    FRONTIERS IN BIOINFORMATICS, 2023, 3
  • [24] Article Compound-protein interaction prediction based on heterogeneous network reveals potential antihepatoma agents
    Wang, Yong-Cui
    Li, Tian-Ze
    Chen, Ji-Jun
    ISCIENCE, 2024, 27 (08)
  • [25] An inductive graph neural network model for compound-protein interaction prediction based on a homogeneous graph
    Wan, Xiaozhe
    Wu, Xiaolong
    Wang, Dingyan
    Tan, Xiaoqin
    Liu, Xiaohong
    Fu, Zunyun
    Jiang, Hualiang
    Zheng, Mingyue
    Li, Xutong
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (03)
  • [26] Compound-Protein Interaction Prediction Within Chemogenomics: Theoretical Concepts, Practical Usage, and Future Directions
    Brown, J. B.
    Niijima, Satoshi
    Okuno, Yasushi
    MOLECULAR INFORMATICS, 2013, 32 (11-12) : 906 - 921
  • [27] Compound-protein interaction prediction with end-to-end learning of neural networks for graphs and sequences
    Tsubaki, Masashi
    Tomii, Kentaro
    Sese, Jun
    BIOINFORMATICS, 2019, 35 (02) : 309 - 318
  • [28] Improving compound-protein interaction prediction by focusing on intra-modality and inter-modality dynamics with a multimodal tensor fusion strategy
    Wang, Meng
    Wang, Jianmin
    Ji, Jianxin
    Ma, Chenjing
    Wang, Hesong
    He, Jia
    Song, Yongzhen
    Zhang, Xuan
    Cao, Yong
    Dai, Yanyan
    Hua, Menglei
    Qin, Ruihao
    Li, Kang
    Cao, Lei
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2024, 23 : 3714 - 3729
  • [29] Improved compound-protein interaction site and binding affinity prediction using self-supervised protein embeddings
    Wu, Jialin
    Liu, Zhe
    Yang, Xiaofeng
    Lin, Zhanglin
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [30] An inductive graph neural network model for compound-protein interaction prediction based on a homogeneous graph
    Wan, Xiaozhe
    Wu, Xiaolong
    Wang, Dingyan
    Tan, Xiaoqin
    Liu, Xiaohong
    Fu, Zunyun
    Jiang, Hualiang
    Zheng, Mingyue
    Li, Xutong
    BRIEFINGS IN BIOINFORMATICS, 2022,