A Novel Computational Framework to Predict Disease-Related Copy Number Variations by Integrating Multiple Data Sources

被引:6
|
作者
Yuan, Lin [1 ]
Sun, Tao [1 ]
Zhao, Jing [1 ]
Shen, Zhen [2 ]
机构
[1] Qilu Univ Technol, Shandong Acad Sci, Sch Comp Sci & Technol, Jinan, Peoples R China
[2] Nanyang Inst Technol, Sch Comp & Software, Nanyang, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
CNV; multi-omics data; path association analysis; stability selection; prostate cancer; ASSOCIATION ANALYSIS; WIDE ASSOCIATION; GENE-EXPRESSION; PHOSPHORYLATION; IDENTIFICATION; REGRESSION; ARCHIVES;
D O I
10.3389/fgene.2021.696956
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Copy number variation (CNV) may contribute to the development of complex diseases. However, due to the complex mechanism of path association and the lack of sufficient samples, understanding the relationship between CNV and cancer remains a major challenge. The unprecedented abundance of CNV, gene, and disease label data provides us with an opportunity to design a new machine learning framework to predict potential disease-related CNVs. In this paper, we developed a novel machine learning approach, namely, IHI-BMLLR (Integrating Heterogeneous Information sources with Biweight Mid-correlation and L1-regularized Logistic Regression under stability selection), to predict the CNV-disease path associations by using a data set containing CNV, disease state labels, and gene data. CNVs, genes, and diseases are connected through edges and then constitute a biological association network. To construct a biological network, we first used a self-adaptive biweight mid-correlation (BM) formula to calculate correlation coefficients between CNVs and genes. Then, we used logistic regression with L1 penalty (LLR) function to detect genes related to disease. We added stability selection strategy, which can effectively reduce false positives, when using self-adaptive BM and LLR. Finally, a weighted path search algorithm was applied to find top D path associations and important CNVs. The experimental results on both simulation and prostate cancer data show that IHI-BMLLR is significantly better than two state-of-the-art CNV detection methods (i.e., CCRET and DPtest) under false-positive control. Furthermore, we applied IHI-BMLLR to prostate cancer data and found significant path associations. Three new cancer-related genes were discovered in the paths, and these genes need to be verified by biological research in the future.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] High rate of disease-related copy number variations in childhood onset schizophrenia
    Ahn, K.
    Gotay, N.
    Andersen, T. M.
    Anvari, A. A.
    Gochman, P.
    Lee, Y.
    Sanders, S.
    Guha, S.
    Darvasi, A.
    Glessner, J. T.
    Hakonarson, H.
    Lencz, T.
    State, M. W.
    Shugart, Y. Y.
    Rapoport, J. L.
    MOLECULAR PSYCHIATRY, 2014, 19 (05) : 568 - 572
  • [2] High rate of disease-related copy number variations in childhood onset schizophrenia
    K Ahn
    N Gotay
    T M Andersen
    A A Anvari
    P Gochman
    Y Lee
    S Sanders
    S Guha
    A Darvasi
    J T Glessner
    H Hakonarson
    T Lencz
    M W State
    Y Y Shugart
    J L Rapoport
    Molecular Psychiatry, 2014, 19 : 568 - 572
  • [3] Copy Number Variation Analysis for Identification of Novel Disease-related Regions in Bladder Cancer
    Wajnberg, G.
    Brait, M.
    Folador, E. L.
    Parrella, P.
    Caims, P.
    Barbano, R.
    Ferreira, C. G.
    Passetti, F.
    Sidransky, D.
    Hoque, M. O.
    EUROPEAN JOURNAL OF CANCER, 2012, 48 : S136 - S136
  • [4] Systematic tracking of coordinated differential network motifs identifies novel disease-related genes by integrating multiple data
    Shi, Kai
    Gao, Lin
    Wang, Bingbo
    NEUROCOMPUTING, 2016, 206 : 3 - 12
  • [5] Framework for the identification of common variations in multiple DNA copy number samples
    Alqallaf, Abdullah K.
    Tewfik, Ahmed H.
    CONFERENCE RECORD OF THE FORTY-FIRST ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1-5, 2007, : 39 - 43
  • [6] Copy number variations exploration of multiple genes in Graves' disease
    Song, Rong-hua
    Shao, Xiao-qing
    Li, Ling
    Wang, Wen
    Zhang, Jin-an
    MEDICINE, 2017, 96 (04)
  • [7] Framework for the analysis of genetic variations across multiple DNA copy number samples
    Alqallaf, Abdullah K.
    Tewfik, Ahmed H.
    Selleck, Scott B.
    Johnson, Rebecca
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 553 - +
  • [8] Integrating multiple types of data to predict novel cell cycle-related genes
    Wang, Lin
    Hou, Lin
    Qian, Minping
    Li, Fangting
    Deng, Minghua
    BMC SYSTEMS BIOLOGY, 2011, 5
  • [9] A Novel Framework for Integrating Heterogeneous Data Sources through Data Exchange
    Cheng, Yin -Ting
    Chen, Ming-Chih
    SENSORS AND MATERIALS, 2023, 35 (07) : 2603 - 2618
  • [10] Identifying disease genes by integrating multiple data sources
    Chen, Bolin
    Wang, Jianxin
    Li, Min
    Wu, Fang-Xiang
    BMC MEDICAL GENOMICS, 2014, 7