Genome-wide association analysis by lasso penalized logistic regression

被引:530
|
作者
Wu, Tong Tong [5 ]
Chen, Yi Fang [4 ]
Hastie, Trevor [3 ,4 ]
Sobel, Eric [1 ]
Lange, Kenneth [1 ,2 ]
机构
[1] Univ Calif Los Angeles, Dept Human Genet, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Dept Biomath, Los Angeles, CA 90095 USA
[3] Stanford Univ, Dept Biostat, Stanford, CA 94305 USA
[4] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[5] Univ Maryland, Dept Epidemiol & Biostat, College Pk, MD 20742 USA
基金
美国国家科学基金会;
关键词
D O I
10.1093/bioinformatics/btp041
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: In ordinary regression, imposition of a lasso penalty makes continuous model selection straightforward. Lasso penalized regression is particularly advantageous when the number of predictors far exceeds the number of observations. Method: The present article evaluates the performance of lasso penalized logistic regression in case-control disease gene mapping with a large number of SNPs (single nucleotide polymorphisms) predictors. The strength of the lasso penalty can be tuned to select a predetermined number of the most relevant SNPs and other predictors. For a given value of the tuning constant, the penalized likelihood is quickly maximized by cyclic coordinate ascent. Once the most potent marginal predictors are identified, their two-way and higher order interactions can also be examined by lasso penalized logistic regression. Results: This strategy is tested on both simulated and real data. Our findings on coeliac disease replicate the previous SNP results and shed light on possible interactions among the SNPs.
引用
收藏
页码:714 / 721
页数:8
相关论文
共 50 条
  • [31] Genome-wide barebones regression scan for mixed-model association analysis
    Gao, Jin
    Zhou, Xuefei
    Hao, Zhiyu
    Jiang, Li
    Yang, Runqing
    THEORETICAL AND APPLIED GENETICS, 2020, 133 (01) : 51 - 58
  • [32] Genome-wide pathway analysis of a genome-wide association study on multiple sclerosis
    Gwan Gyu Song
    Sung Jae Choi
    Jong Dae Ji
    Young Ho Lee
    Molecular Biology Reports, 2013, 40 : 2557 - 2564
  • [33] Genome-wide pathway analysis of a genome-wide association study on multiple sclerosis
    Song, Gwan Gyu
    Choi, Sung Jae
    Ji, Jong Dae
    Lee, Young Ho
    MOLECULAR BIOLOGY REPORTS, 2013, 40 (03) : 2557 - 2564
  • [34] Penalized Multimarker vs. Single-Marker Regression Methods for Genome-Wide Association Studies of Quantitative Traits
    Yi, Hui
    Breheny, Patrick
    Imam, Netsanet
    Liu, Yongmei
    Hoeschele, Ina
    GENETICS, 2015, 199 (01) : 205 - U334
  • [35] Penalized multivariate linear mixed model for longitudinal genome-wide association studies
    Jin Liu
    Jian Huang
    Shuangge Ma
    BMC Proceedings, 8 (Suppl 1)
  • [36] A modification of the lasso method by using the bahadur representation for the genome-wide association study
    Utkin, Lev V.
    Zhuk, Yulia A.
    Informatica (Slovenia), 2018, 42 (02): : 175 - 188
  • [37] A Modification of the Lasso Method by Using the Bahadur Representation for the Genome-Wide Association Study
    Utkin, Lev V.
    Zhuk, Yulia A.
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2018, 42 (02): : 175 - 188
  • [38] A comparison of Cox and logistic regression for use in genome-wide association studies of cohort and case-cohort design
    Staley, James R.
    Jones, Edmund
    Kaptoge, Stephen
    Butterworth, Adam S.
    Sweeting, Michael J.
    Wood, Angela M.
    Howson, Joanna M. M.
    EUROPEAN JOURNAL OF HUMAN GENETICS, 2017, 25 (07) : 854 - 862
  • [39] A comparison of Cox and logistic regression for use in genome-wide association studies of cohort and case-cohort design
    James R Staley
    Edmund Jones
    Stephen Kaptoge
    Adam S Butterworth
    Michael J Sweeting
    Angela M Wood
    Joanna M M Howson
    European Journal of Human Genetics, 2017, 25 : 854 - 862
  • [40] Trinculo: Bayesian and frequentist multinomial logistic regression for genome-wide association studies of multi-category phenotypes
    Jostins, Luke
    McVean, Gilean
    BIOINFORMATICS, 2016, 32 (12) : 1898 - 1900