Principal component analysis with missing values: a comparative survey of methods

被引:0
|
作者
Stéphane Dray
Julie Josse
机构
[1] Université de Lyon,Applied Mathematics Department
[2] Université Lyon 1,undefined
[3] CNRS,undefined
[4] UMR5558,undefined
[5] Laboratoire de Biométrie et Biologie Evolutive,undefined
[6] Agrocampus Ouest,undefined
来源
Plant Ecology | 2015年 / 216卷
关键词
Imputation; Ordination; PCA; Traits;
D O I
暂无
中图分类号
学科分类号
摘要
Principal component analysis (PCA) is a standard technique to summarize the main structures of a data table containing the measurements of several quantitative variables for a number of individuals. Here, we study the case where some of the data values are missing and propose a review of methods which accommodate PCA to missing data. In plant ecology, this statistical challenge relates to the current effort to compile global plant functional trait databases producing matrices with a large amount of missing values. We present several techniques to consider or estimate (impute) missing values in PCA and compare them using theoretical considerations. We carried out a simulation study to evaluate the relative merits of the different approaches in various situations (correlation structure, number of variables and individuals, and percentage of missing values) and also applied them on a real data set. Lastly, we discuss the advantages and drawbacks of these approaches, the potential pitfalls and future challenges that need to be addressed in the future.
引用
收藏
页码:657 / 667
页数:10
相关论文
共 50 条
  • [41] A Study on Bayesian Principal Component Analysis for Addressing Missing Rainfall Data
    Wai Yan Lai
    K. K. Kuok
    Water Resources Management, 2019, 33 : 2615 - 2628
  • [42] MEASURING THE VALUES OF WATER RESOURCES: AN APPLICATION OF PRINCIPAL COMPONENT ANALYSIS
    Matsiori, S.
    Neofitou, C.
    Aggelopoulos, S.
    Soutsas, K.
    JOURNAL OF ENVIRONMENTAL PROTECTION AND ECOLOGY, 2013, 14 (02): : 781 - 785
  • [43] A Study on Bayesian Principal Component Analysis for Addressing Missing Rainfall Data
    Lai, Wai Yan
    Kuok, K. K.
    WATER RESOURCES MANAGEMENT, 2019, 33 (08) : 2615 - 2628
  • [44] Comparative Performance Analysis of Three Algorithms for Principal Component Analysis
    Landqvist, Ronnie
    Mohammed, Abbas
    RADIOENGINEERING, 2006, 15 (04) : 84 - 90
  • [45] Missing values in multi-level simultaneous component analysis
    Josse, Julie
    Timmerman, Marieke E.
    Kiers, Henk A. L.
    CHEMOMETRICS AND INTELLIGENT LABORATORY SYSTEMS, 2013, 129 : 21 - 32
  • [46] COMPARATIVE ELUCIDATION OF GARLIC PEELING METHODS AND POSITIONING OF QUALITY CHARACTERISTICS USING PRINCIPAL COMPONENT ANALYSIS
    Prakash, Prem
    Prasad, Kamlesh
    ACTA SCIENTIARUM POLONORUM-TECHNOLOGIA ALIMENTARIA, 2023, 22 (02) : 119 - 131
  • [47] Experimental analysis of methods for imputation of missing values in databases
    Farhangfar, A
    Kurgan, L
    Pedrycz, W
    INTELLIGENT COMPUTING: THEORY AND APPLICATIONS II, 2004, 5421 : 172 - 182
  • [48] Methods for repeated measures data analysis with missing values
    Carriere, KC
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 1999, 77 (02) : 221 - 236
  • [49] Comparison of sparse Kernel Principal Component Analysis methods
    Gou, Zhen Kun
    Feng, JunKang
    Fyfe, Colin
    International Conference on Knowledge-Based Intelligent Electronic Systems, Proceedings, KES, 2000, 1 : 309 - 312
  • [50] A comparison of sparse kernel principal component analysis methods
    Gou, ZK
    Feng, JK
    Fyfe, C
    KES'2000: FOURTH INTERNATIONAL CONFERENCE ON KNOWLEDGE-BASED INTELLIGENT ENGINEERING SYSTEMS & ALLIED TECHNOLOGIES, VOLS 1 AND 2, PROCEEDINGS, 2000, : 309 - 312