PUMA: A Unified Framework for Penalized Multiple Regression Analysis of GWAS Data

被引:27
|
作者
Hoffman, Gabriel E. [1 ]
Logsdon, Benjamin A. [1 ,2 ]
Mezey, Jason G. [1 ,3 ]
机构
[1] Cornell Univ, Dept Biol Stat & Computat Biol, Ithaca, NY 14850 USA
[2] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[3] Weill Cornell Med Coll, Dept Med Genet, New York, NY USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
GENOME-WIDE ASSOCIATION; VARIATIONAL BAYES ALGORITHM; VARIABLE SELECTION; STABILITY SELECTION; SUSCEPTIBILITY LOCI; BIOLOGICAL PATHWAYS; GENETIC-VARIANTS; MISSING-DATA; IDENTIFIES; METAANALYSIS;
D O I
10.1371/journal.pcbi.1003101
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Penalized Multiple Regression (PMR) can be used to discover novel disease associations in GWAS datasets. In practice, proposed PMR methods have not been able to identify well-supported associations in GWAS that are undetectable by standard association tests and thus these methods are not widely applied. Here, we present a combined algorithmic and heuristic framework for PUMA (Penalized Unified Multiple-locus Association) analysis that solves the problems of previously proposed methods including computational speed, poor performance on genome-scale simulated data, and identification of too many associations for real data to be biologically plausible. The framework includes a new minorize-maximization (MM) algorithm for generalized linear models (GLM) combined with heuristic model selection and testing methods for identification of robust associations. The PUMA framework implements the penalized maximum likelihood penalties previously proposed for GWAS analysis (i.e. Lasso, Adaptive Lasso, NEG, MCP), as well as a penalty that has not been previously applied to GWAS (i.e. LOG). Using simulations that closely mirror real GWAS data, we show that our framework has high performance and reliably increases power to detect weak associations, while existing PMR methods can perform worse than single marker testing in overall performance. To demonstrate the empirical value of PUMA, we analyzed GWAS data for type 1 diabetes, Crohns's disease, and rheumatoid arthritis, three autoimmune diseases from the original Wellcome Trust Case Control Consortium. Our analysis replicates known associations for these diseases and we discover novel etiologically relevant susceptibility loci that are invisible to standard single marker tests, including six novel associations implicating genes involved in pancreatic function, insulin pathways and immune-cell function in type 1 diabetes; three novel associations implicating genes in pro-and anti-inflammatory pathways in Crohn's disease; and one novel association implicating a gene involved in apoptosis pathways in rheumatoid arthritis. We provide software for applying our PUMA analysis framework.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] Classification of microarray data with penalized logistic regression
    Eilers, PHC
    Boer, JM
    van Ommen, GJ
    van Houwelingen, HC
    MICROARRAYS: OPTICAL TECHNOLOGIES AND INFORMATICS, 2001, 4266 : 187 - 198
  • [32] A Unified Framework for Association Analysis with Multiple Related Phenotypes
    Stephens, Matthew
    PLOS ONE, 2013, 8 (07):
  • [33] A Unified Framework for Deep Symbolic Regression
    Landajuela, Mikel
    Lee, Chak Shing
    Yang, Jiachen
    Glatt, Ruben
    Santiago, Claudio
    Mundhenk, T. Nathan
    Aravena, Ignacio
    Mulcahy, Garrett
    Petersen, Brenden
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [34] Structured Penalized Logistic Regression for Gene Selection in Gene Expression Data Analysis
    Liu, Cheng
    Wong, Hau San
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (01) : 312 - 321
  • [35] MULTIPLE REGRESSION ANALYSIS OF SOIL DATA
    WADLEIGH, CH
    FIREMAN, M
    SOIL SCIENCE, 1954, 78 (02) : 127 - 139
  • [36] MULTIPLE REGRESSION ANALYSIS OF SOIL DATA
    WADLEIGH, CH
    BIOMETRICS, 1951, 7 (03) : 301 - 301
  • [37] Penalized functional regression analysis of white-matter tract profiles in multiple sclerosis
    Goldsmith, Jeff
    Crainiceanu, Ciprian M.
    Caffo, Brian S.
    Reich, Daniel S.
    NEUROIMAGE, 2011, 57 (02) : 431 - 439
  • [38] A Unified Framework for Outlier Detection in Trace Data Analysis
    Li, Zhiguo
    Baseman, Robert J.
    Zhu, Yada
    Tipu, Fateh A.
    Slonim, Noam
    Shpigelman, Lavi
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2014, 27 (01) : 95 - 103
  • [39] A unified delay analysis framework for opportunistic data collection
    Dong Zhao
    Huadong Ma
    Qi Li
    Shaojie Tang
    Wireless Networks, 2018, 24 : 1313 - 1325
  • [40] Penalized regression procedures for variable selection in the potential outcomes framework
    Ghosh, Debashis
    Zhu, Yeying
    Coffman, Donna L.
    STATISTICS IN MEDICINE, 2015, 34 (10) : 1645 - 1658