Principal component analysis with missing values: a comparative survey of methods

被引:0
|
作者
Stéphane Dray
Julie Josse
机构
[1] Université de Lyon,Applied Mathematics Department
[2] Université Lyon 1,undefined
[3] CNRS,undefined
[4] UMR5558,undefined
[5] Laboratoire de Biométrie et Biologie Evolutive,undefined
[6] Agrocampus Ouest,undefined
来源
Plant Ecology | 2015年 / 216卷
关键词
Imputation; Ordination; PCA; Traits;
D O I
暂无
中图分类号
学科分类号
摘要
Principal component analysis (PCA) is a standard technique to summarize the main structures of a data table containing the measurements of several quantitative variables for a number of individuals. Here, we study the case where some of the data values are missing and propose a review of methods which accommodate PCA to missing data. In plant ecology, this statistical challenge relates to the current effort to compile global plant functional trait databases producing matrices with a large amount of missing values. We present several techniques to consider or estimate (impute) missing values in PCA and compare them using theoretical considerations. We carried out a simulation study to evaluate the relative merits of the different approaches in various situations (correlation structure, number of variables and individuals, and percentage of missing values) and also applied them on a real data set. Lastly, we discuss the advantages and drawbacks of these approaches, the potential pitfalls and future challenges that need to be addressed in the future.
引用
收藏
页码:657 / 667
页数:10
相关论文
共 50 条
  • [31] A comparative analysis of the principal component analysis and entropy weight methods to establish the indexing measurement
    Wu, Robert M. X.
    Zhang, Zhongwu
    Yan, Wanjun
    Fan, Jianfeng
    Gou, Jinwen
    Liu, Bao
    Gide, Ergun
    Soar, Jeffrey
    Shen, Bo
    Fazal-E-Hasan, Syed
    Liu, Zengquan
    Zhang, Peng
    Wang, Peilin
    Cui, Xinxin
    Peng, Zhanfei
    Wang, Ya
    PLOS ONE, 2022, 17 (01):
  • [32] PRINCIPAL COMPONENT ANALYSIS OF SYSTEMIC LUPUS-ERYTHEMATOSUS (SLE) - A PROPOSAL FOR HANDLING DATA WITH MANY MISSING VALUES
    YAWO, H
    SHINAGAWA, Y
    SHINAGAWA, Y
    TSUNEMATSU, T
    COMPUTERS AND BIOMEDICAL RESEARCH, 1981, 14 (03): : 248 - 261
  • [33] Robust principal component analysis of electromagnetic arrays with missing data
    Smirnov, M. Yu
    Egbert, G. D.
    GEOPHYSICAL JOURNAL INTERNATIONAL, 2012, 190 (03) : 1423 - 1438
  • [34] Principal component analysis for data containing outliers and missing elements
    Serneels, Sven
    Verdonck, Tim
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (03) : 1712 - 1727
  • [35] Interpolation of signals with missing data using Principal Component Analysis
    Oliveira, P.
    Gomes, L.
    MULTIDIMENSIONAL SYSTEMS AND SIGNAL PROCESSING, 2010, 21 (01) : 25 - 43
  • [36] Interpolation of signals with missing data using Principal Component Analysis
    P. Oliveira
    L. Gomes
    Multidimensional Systems and Signal Processing, 2010, 21 : 25 - 43
  • [37] A comparative analysis of principal component and independent component techniques for electrocardiograms
    Chawla, M. P. S.
    NEURAL COMPUTING & APPLICATIONS, 2009, 18 (06): : 539 - 556
  • [38] A comparative analysis of principal component and independent component techniques for electrocardiograms
    M. P. S. Chawla
    Neural Computing and Applications, 2009, 18 : 539 - 556
  • [39] A COMPARISON OF EIGENVALUE METHODS FOR PRINCIPAL COMPONENT ANALYSIS
    Danisman, Y.
    Yilmaz, M. F.
    Ozkaya, A.
    Comlekciler, I. T.
    APPLIED AND COMPUTATIONAL MATHEMATICS, 2014, 13 (03) : 316 - 331