REMI: REGRESSION WITH MARGINAL INFORMATION AND ITS APPLICATION IN GENOME-WIDE ASSOCIATION STUDIES

被引:0
|
作者
Huang, Jian [1 ,2 ]
Jiao, Yuling [1 ,2 ]
Liu, Jin [1 ,2 ]
Yang, Can [1 ,2 ]
机构
[1] Univ Iowa, Duke NUS Med Sch, Zhongnan Univ Econ & Law, Iowa City, IA 52242 USA
[2] Hong Kong Univ Sci & Technol, Hong Kong, Peoples R China
关键词
Genome-wide association studies; high dimensional regression; marginal information; polygenic risk score; VARIABLE SELECTION; GENETIC ARCHITECTURE; CAUSAL VARIANTS; STATISTICS; COMMON; REGULARIZATION; HERITABILITY; LASSO; LOCI;
D O I
10.5705/ss.202019.0182
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We consider the problem of variable selection and estimation in high-dimensional linear regression models when complete data are not accessible, but we do have certain marginal information or summary statistics. This problem is motivated by genome-wide association studies (GWASs) with millions of genotyped single nucleotide polymorphisms (SNPs), which have been widely used to identify risk variants among complex human traits/diseases. With the large number of completed GWASs, statistical methods using summary statistics have become increasingly important because of the inaccessibility of individual-level data. In this study, we propose the regression with marginal information (REMI) method, an l(1) penalized approach with estimated marginal effects and an estimated covariance matrix of the predictors with external reference samples. The proposed method is highly scalable and capable of analyzing multiple GWAS data sets from hundreds of thousands individuals and a large number of SNPs. We also establish an upper bound on the error of the REMI estimator, which has the same order as that of the minimax error bound of the Lasso with complete individual-level data. We conduct simulation studies to evaluate the performance of the proposed method. An interesting finding is that when there is a large number of marginal estimates available with a small number of reference samples, as in a GWAS, the proposed method yields good estimation and prediction results, outperforming the Lasso with complete data, but with a relatively small sample size. We apply the proposed method to the 10 traits GWAS data of the Northern Finland Birth Cohorts program. In particular, the real-data analysis results indicate that a summary-level-based analysis using the REMI outperforms an individual-level-based analysis when the sample size of the summary-level data is larger than that of the individual-level data. In summary, our theoretical and real-data results provide solid support for a summarylevel-based analysis. As a result, polygenic risk scores of a wide variety of complex diseases can be obtained using summary statistics with theoretically guaranteed performance. The developed R package and the code to reproduce the results are available at https: //github. com/gordonliu810822/REMI.
引用
收藏
页码:1985 / 2004
页数:20
相关论文
共 50 条
  • [41] Problems with genome-wide association studies
    Williams, Scott M.
    Canter, Jeffrey A.
    Crawford, Dana C.
    Moore, Jason H.
    Ritchie, Marylyn D.
    Haines, Jonathan L.
    SCIENCE, 2007, 316 (5833) : 1841 - 1842
  • [42] Genome-Wide Association Studies in Glioma
    Kinnersley, Ben
    Houlston, Richard S.
    Bondy, Melissa L.
    CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION, 2018, 27 (04) : 418 - 428
  • [43] Human Genome-wide association studies
    Keith, Tim
    GENETIC ENGINEERING NEWS, 2007, 27 (02): : 22 - 22
  • [44] Trio Genome-Wide Association Studies
    Badini, Isabella
    Davies, Neil
    BEHAVIOR GENETICS, 2024, 54 (06) : 548 - 549
  • [45] Genome-wide association studies in Plasmodiumspecies
    Bridget Penman
    Caroline Buckee
    Sunetra Gupta
    Sean Nee
    BMC Biology, 8
  • [46] Genome-wide association studies in cancer
    Easton, Douglas F.
    Eeles, Rosalind A.
    HUMAN MOLECULAR GENETICS, 2008, 17 : R109 - R115
  • [47] Genome-wide association studies in mice
    Flint, Jonathan
    Eskin, Eleazar
    NATURE REVIEWS GENETICS, 2012, 13 (11) : 807 - 817
  • [48] Replicating genome-wide association studies
    Kuniholm, Mark H.
    SCIENCE, 2007, 318 (5849) : 390 - 390
  • [49] Genome-wide association studies in mice
    Jonathan Flint
    Eleazar Eskin
    Nature Reviews Genetics, 2012, 13 : 807 - 817
  • [50] Genome-Wide Association Studies in Hepatology
    Weber, S.
    Gruenhage, F.
    Hall, R.
    Lammert, F.
    ZEITSCHRIFT FUR GASTROENTEROLOGIE, 2010, 48 (01): : 56 - 64