A Novel Computational Framework to Predict Disease-Related Copy Number Variations by Integrating Multiple Data Sources

被引:6
|
作者
Yuan, Lin [1 ]
Sun, Tao [1 ]
Zhao, Jing [1 ]
Shen, Zhen [2 ]
机构
[1] Qilu Univ Technol, Shandong Acad Sci, Sch Comp Sci & Technol, Jinan, Peoples R China
[2] Nanyang Inst Technol, Sch Comp & Software, Nanyang, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
CNV; multi-omics data; path association analysis; stability selection; prostate cancer; ASSOCIATION ANALYSIS; WIDE ASSOCIATION; GENE-EXPRESSION; PHOSPHORYLATION; IDENTIFICATION; REGRESSION; ARCHIVES;
D O I
10.3389/fgene.2021.696956
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Copy number variation (CNV) may contribute to the development of complex diseases. However, due to the complex mechanism of path association and the lack of sufficient samples, understanding the relationship between CNV and cancer remains a major challenge. The unprecedented abundance of CNV, gene, and disease label data provides us with an opportunity to design a new machine learning framework to predict potential disease-related CNVs. In this paper, we developed a novel machine learning approach, namely, IHI-BMLLR (Integrating Heterogeneous Information sources with Biweight Mid-correlation and L1-regularized Logistic Regression under stability selection), to predict the CNV-disease path associations by using a data set containing CNV, disease state labels, and gene data. CNVs, genes, and diseases are connected through edges and then constitute a biological association network. To construct a biological network, we first used a self-adaptive biweight mid-correlation (BM) formula to calculate correlation coefficients between CNVs and genes. Then, we used logistic regression with L1 penalty (LLR) function to detect genes related to disease. We added stability selection strategy, which can effectively reduce false positives, when using self-adaptive BM and LLR. Finally, a weighted path search algorithm was applied to find top D path associations and important CNVs. The experimental results on both simulation and prostate cancer data show that IHI-BMLLR is significantly better than two state-of-the-art CNV detection methods (i.e., CCRET and DPtest) under false-positive control. Furthermore, we applied IHI-BMLLR to prostate cancer data and found significant path associations. Three new cancer-related genes were discovered in the paths, and these genes need to be verified by biological research in the future.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] A novel study of Copy Number Variations in Hirschsprung disease using the Multiple Ligation-dependent Probe Amplification (MLPA) technique
    Nunez-Torres, Rocio
    Fernandez, Raquel M.
    Lopez-Alonso, Manuel
    Antinolo, Guillermo
    Borrego, Salud
    BMC MEDICAL GENETICS, 2009, 10
  • [22] Big data and portfolio optimization: A novel approach integrating DEA with multiple data sources
    Zhou, Zhongbao
    Gao, Meng
    Xiao, Helu
    Wang, Rui
    Liu, Wenbin
    OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 2021, 104
  • [23] Copy Number Variations Could Predict the Outcome of Bortezomib Plus Melphalan and Prednisone for Initial Treatment of Multiple Myeloma
    Kim, Myungshin
    Lee, Shin Hyo
    Kim, Jiyeon
    Lee, Sung-Eun
    Kim, Yoo-Jin
    Min, Chang-Ki
    GENES CHROMOSOMES & CANCER, 2015, 54 (01): : 20 - 27
  • [24] RETRACTION: A Computational Framework to Study the Effect of Acupuncture on Obesity by Integrating Multiple Levels of Data
    Liu, H.
    Liu, M.
    Jiao, Y.
    BIOMED RESEARCH INTERNATIONAL, 2024, 2024
  • [25] Novel Plasma Biomarkers Of Advanced COPD: A Reflection Of Multiple Disease-Related Pathways
    Merali, S.
    Barrero, C.
    Braverman, A.
    Yeung, A.
    Kelsen, S. G.
    AMERICAN JOURNAL OF RESPIRATORY AND CRITICAL CARE MEDICINE, 2011, 183
  • [26] Rapid Visualisation of Microarray Copy Number Data for the Detection of Structural Variations Linked to a Disease Phenotype
    Carr, Ian M.
    Diggle, Christine P.
    Khan, Kamron
    Inglehearn, Chris
    McKibbin, Martin
    Bonthron, David T.
    Markham, Alexander F.
    Anwar, Rashida
    Dobbie, Angus
    Pena, Sergio D. J.
    Ali, Manir
    PLOS ONE, 2012, 7 (08):
  • [27] Identification of Novel Copy Number Variations of VCAN Gene in Three Chinese Families with Wagner Disease
    Li, Songshan
    Li, Mengke
    Sun, Limei
    Zhao, Xiujuan
    Zhang, Ting
    Huang, Li
    Huang, Sijian
    Chen, Chonglin
    Wang, Zhirong
    Ding, Xiaoyan
    GENES, 2020, 11 (09) : 1 - 10
  • [28] Detection of Significant Copy Number Variations From Multiple Samples in Next-Generation Sequencing Data
    Yuan, Xiguo
    Zhang, Junying
    Yang, Liying
    Bai, Jun
    Fan, Peizhen
    IEEE TRANSACTIONS ON NANOBIOSCIENCE, 2018, 17 (01) : 12 - 20
  • [29] A novel information fusion strategy based on a regularized framework for identifying disease-related microRNAs
    Peng, Li
    Peng, Manman
    Liao, Bo
    Xiao, Qiu
    Liu, Wei
    Huang, Guohua
    Li, Keqin
    RSC ADVANCES, 2017, 7 (70) : 44447 - 44455
  • [30] De novo single-nucleotide and copy number variation in discordant monozygotic twins reveals disease-related genes
    Nirmal Vadgama
    Alan Pittman
    Michael Simpson
    Niranjanan Nirmalananthan
    Robin Murray
    Takeo Yoshikawa
    Peter De Rijk
    Elliott Rees
    George Kirov
    Deborah Hughes
    Tomas Fitzgerald
    Mark Kristiansen
    Kerra Pearce
    Eliza Cerveira
    Qihui Zhu
    Chengsheng Zhang
    Charles Lee
    John Hardy
    Jamal Nasir
    European Journal of Human Genetics, 2019, 27 : 1121 - 1133