Principal Component Regression and Linear Mixed Model in Association Analysis of Structured Samples: Competitors or Complements?

被引:31
|
作者
Zhang, Yiwei [1 ]
Pan, Wei [1 ]
机构
[1] Univ Minnesota, Sch Publ Hlth, Div Biostat, Minneapolis, MN 55455 USA
基金
欧洲研究理事会;
关键词
association testing; confounding; environmental risk; population stratification; probabilistic principal component analysis; POPULATION STRATIFICATION; VARIANTS; SCALE;
D O I
10.1002/gepi.21879
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Genome-wide association studies (GWAS) have been established as a major tool to identify genetic variants associated with complex traits, such as common diseases. However, GWAS may suffer from false positives and false negatives due to confounding population structures, including known or unknown relatedness. Another important issue is unmeasured environmental risk factors. Among many methods for adjusting for population structures, two approaches stand out: one is principal component regression (PCR) based on principal component analysis, which is perhaps the most popular due to its early appearance, simplicity, and general effectiveness; the other is based on a linear mixed model (LMM) that has emerged recently as perhaps the most flexible and effective, especially for samples with complex structures as in model organisms. As shown previously, the PCR approach can be regarded as an approximation to an LMM; such an approximation depends on the number of the top principal components (PCs) used, the choice of which is often difficult in practice. Hence, in the presence of population structure, the LMM appears to outperform the PCR method. However, due to the different treatments of fixed vs. random effects in the two approaches, we show an advantage of PCR over LMM: in the presence of an unknown but spatially confined environmental confounder (e.g., environmental pollution or lifestyle), the PCs may be able to implicitly and effectively adjust for the confounder whereas the LMM cannot. Accordingly, to adjust for both population structures and nongenetic confounders, we propose a hybrid method combining the use and, thus, strengths of PCR and LMM. We use real genotype data and simulated phenotypes to confirm the above points, and establish the superior performance of the hybrid method across all scenarios.
引用
收藏
页码:149 / 155
页数:7
相关论文
共 50 条
  • [1] Forecast Model for Price of Gold: Multiple Linear Regression with Principal Component Analysis
    Manoj, Jyothi
    Suresh, K. K.
    THAILAND STATISTICIAN, 2019, 17 (01): : 125 - 131
  • [2] Principal Component Preliminary Test Estimator in the Linear Regression Model
    Arumairajan, Sivarajah
    Wijekoon, Pushpakanthie
    JOURNAL OF MODERN APPLIED STATISTICAL METHODS, 2016, 15 (01) : 690 - 710
  • [3] Econometric model of iron ore through principal component analysis and multiple linear regression
    Da Silva Campos, Barbara Isabela
    Lopes, Gisele C. A.
    De Castro, Philipe S. C.
    Dos Santos, Tatiana B.
    Souza, Felipe R.
    ANAIS DA ACADEMIA BRASILEIRA DE CIENCIAS, 2023, 95 (01):
  • [4] Hybrid principal component regression estimation in linear regression
    Rong, Jian-Ying
    Liu, Xu-Qing
    ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (06): : 3758 - 3776
  • [5] The small sample properties of the restricted principal component regression estimator in linear regression model
    Wu, Jibo
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (04) : 1661 - 1667
  • [6] Structured Principal Component Analysis Model With Variable Correlation Constraint
    Zhai, Ruikun
    Zeng, Jiusun
    Ge, Zhiqiang
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2022, 30 (02) : 558 - 569
  • [7] Improved principal component analysis and linear regression classification for face recognition
    Zhu, Yani
    Zhu, Chaoyang
    Li, Xiaoxin
    SIGNAL PROCESSING, 2018, 145 : 175 - 182
  • [8] Structured Functional Principal Component Analysis
    Shou, Haochang
    Zipunnikov, Vadim
    Crainiceanu, Ciprian M.
    Greven, Sonja
    BIOMETRICS, 2015, 71 (01) : 247 - 257
  • [9] Principal component regression analysis with SPSS
    Liu, RX
    Kuang, J
    Gong, Q
    Hou, XL
    COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2003, 71 (02) : 141 - 147
  • [10] Achievement Forecast for Rowing Athletes Based on Principal Component Analysis and Linear Regression
    Li, Guangjun
    INTERNATIONAL SYMPOSIUM ON SIGNAL PROCESSING BIOMEDICAL ENGINEERING, AND INFORMATICS (SPBEI 2013), 2014, : 1024 - 1031