An Efficient Nonlinear Regression Approach for Genome-wide Detection of Marginal and Interacting Genetic Variations

被引:2
|
作者
Lee, Seunghak [1 ]
Lozano, Aurelie [2 ]
Kambadur, Prabhanjan [3 ]
Xing, Eric P. [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, 5000 Forbes Ave, Pittsburgh, PA 15217 USA
[2] IBM Corp, TJ Watson Res Ctr, Yorktown Hts, NY USA
[3] Bloomberg LP, New York, NY USA
关键词
genome-wide association mapping; SNP-SNP interaction; piecewise linear model screening; stability selection; group lasso; ALZHEIMERS-DISEASE; LATE-ONSET; ASSOCIATION; LASSO; DOPAMINE;
D O I
10.1089/cmb.2015.0202
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Genome-wide association studies have revealed individual genetic variants associated with phenotypic traits such as disease risk and gene expressions. However, detecting pairwise interaction effects of genetic variants on traits still remains a challenge due to a large number of combinations of variants (approximate to 10(11) SNP pairs in the human genome), and relatively small sample sizes (typically <10(4)). Despite recent breakthroughs in detecting interaction effects, there are still several open problems, including: (1) how to quickly process a large number of SNP pairs, (2) how to distinguish between true signals and SNPs/SNP pairs merely correlated with true signals, (3) how to detect nonlinear associations between SNP pairs and traits given small sample sizes, and (4) how to control false positives. In this article, we present a unified framework, called SPHINX, which addresses the aforementioned challenges. We first propose a piecewise linear model for interaction detection, because it is simple enough to estimate model parameters given small sample sizes but complex enough to capture nonlinear interaction effects. Then, based on the piecewise linear model, we introduce randomized group lasso under stability selection, and a screening algorithm to address the statistical and computational challenges mentioned above. In our experiments, we first demonstrate that SPHINX achieves better power than existing methods for interaction detection under false positive control. We further applied SPHINX to late-onset Alzheimer's disease dataset, and report 16 SNPs and 17 SNP pairs associated with gene traits. We also present a highly scalable implementation of our screening algorithm, which can screen approximate to 118 billion candidates of associations on a 60-node cluster in <5.5 hours.
引用
收藏
页码:372 / 389
页数:18
相关论文
共 50 条
  • [21] Genome-Wide Identification and Genetic Variations of the Starch Synthase Gene Family in Rice
    Zhang, Hongjia
    Jang, Seong-Gyu
    Lar, San Mar
    Lee, Ah-Rim
    Cao, Fang-Yuan
    Seo, Jeonghwan
    Kwon, Soon-Wook
    PLANTS-BASEL, 2021, 10 (06):
  • [22] Genome-wide association approach
    Thomas, Gilles
    M S-MEDECINE SCIENCES, 2009, 25 : 42 - 44
  • [23] Genome-Wide Identification of Discriminative Genetic Variations in Beef and Dairy Cattle via an Information-Theoretic Approach
    Kim, Soo-Jin
    Ha, Jung-Woo
    Kim, Heebal
    GENES, 2020, 11 (06) : 1 - 18
  • [24] Genome-wide detection of intervals of genetic heterogeneity associated with complex traits
    Llinares-Lopez, Felipe
    Grimm, Dominik G.
    Bodenham, Dean A.
    Gieraths, Udo
    Sugiyama, Mahito
    Rowan, Beth
    Borgwardt, Karsten
    BIOINFORMATICS, 2015, 31 (12) : 240 - 249
  • [25] Genome-Wide Detection of Copy Number Variations in Unsolved Inherited Retinal Disease
    Huang, Xiu-Feng
    Mao, Jian-Yang
    Huang, Zhi-Qin
    Rao, Feng-Qin
    Cheng, Fei-Fei
    Li, Fen-Fen
    Wang, Qing-Feng
    Jin, Zi-Bing
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2017, 58 (01) : 424 - 429
  • [26] Genome-Wide Detection of Copy Number Variations Associated with Miniature Features in Horses
    Choudhury, Md. Panir
    Wang, Zihao
    Zhu, Min
    Teng, Shaohua
    Yan, Jing
    Cao, Shuwei
    Yi, Guoqiang
    Liu, Yuwen
    Liao, Yuying
    Tang, Zhonglin
    GENES, 2023, 14 (10)
  • [27] Genome-wide detection of copy-number variations in local cattle breeds
    Di Gerlando, Rosalia
    Sardina, Maria Teresa
    Tolone, Marco
    Sutera, Anna Maria
    Mastrangelo, Salvatore
    Portolano, Baldassare
    ANIMAL PRODUCTION SCIENCE, 2019, 59 (05) : 815 - 822
  • [28] Association between genetic variants and cisplatin nephrotoxicity: A genome-wide approach
    Zazuli, Z.
    Xu, W.
    Vijverberg, S.
    Masereeuw, R.
    Mirshams, M.
    Khan, K.
    Ordonez-Perez, B.
    Huang, S. H.
    Spreafico, A.
    Hansen, A. R.
    Goldstein, D.
    de Almeida, J.
    Bratman, S.
    Hope, A.
    Carleton, B.
    Maitland-van der Zee, A-H.
    Liu, G.
    ANNALS OF ONCOLOGY, 2020, 31 : S674 - S674
  • [29] Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach
    Daetwyler, Hans D.
    Villanueva, Beatriz
    Woolliams, John A.
    PLOS ONE, 2008, 3 (10):
  • [30] Association between genetic variants and cisplatin nephrotoxicity: A genome-wide approach
    Zazuli, Zulfan
    Xu, Wei
    Vijverberg, Susanne
    Masereeuw, Rosalinde
    Mirshams, Maryam
    Khan, Khaleeq
    Ordonez-Perez, Bayardo
    Huang, Shao Hui
    Spreafico, Anna
    Hansen, Aaron
    Goldstein, David
    Bratman, Scott
    Hope, Andrew
    Carleton, Bruce
    Maitland-van der Zee, Anke Hilse
    Liu, Geoffrey
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2020, 29 : 436 - 437