Clustering data with non-ignorable missingness using semi-parametric mixture models assuming independence within components

被引:4
|
作者
de Chaumaray, Marie du Roy [1 ]
Marbac, Matthieu [1 ]
机构
[1] Univ Rennes, Ensai, CNRS, CREST UMR 9194, F-35000 Rennes, France
关键词
Clustering; Mixture model; Non-ignorable missingness; Smoothed likelihood; NONPARAMETRIC-ESTIMATION; MULTIVARIATE; DISTRIBUTIONS; LIKELIHOOD;
D O I
10.1007/s11634-023-00534-w
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose a semi-parametric clustering model assuming conditional independence given the component. One advantage is that this model can handle non-ignorable missingness. The model defines each component as a product of univariate probability distributions but makes no assumption on the form of each univariate density. Note that the mixture model is used for clustering but not for estimating the density of the full variables (observed and unobserved). Estimation is performed by maximizing an extension of the smoothed likelihood allowing missingness. This optimization is achieved by a Majorization-Minorization algorithm. We illustrate the relevance of our approach by numerical experiments conducted on simulated data. Under mild assumptions, we show the identifiability of the model defining the distribution of the observed data and the monotonicity of the algorithm. We also propose an extension of this new method to the case of mixed-type data that we illustrate on a real data set. The proposed method is implemented in the R package MNARclust available on CRAN.
引用
收藏
页码:1081 / 1122
页数:42
相关论文
共 29 条
  • [1] Clustering data with non-ignorable missingness using semi-parametric mixture models assuming independence within components
    Marie du Roy de Chaumaray
    Matthieu Marbac
    Advances in Data Analysis and Classification, 2023, 17 : 1081 - 1122
  • [2] Semi-parametric methods of handling missing data in mortal cohorts under non-ignorable missingness
    Wen, Lan
    Seaman, Shaun R.
    BIOMETRICS, 2018, 74 (04) : 1427 - 1437
  • [3] A Semi-parametric Approach for Analyzing Longitudinal Measurements with Non-ignorable Missingness Using Regression Spline
    Baghfalaki, Taban
    Sefidi, Saeide
    Ganjali, Mojtaba
    APPLICATIONS AND APPLIED MATHEMATICS-AN INTERNATIONAL JOURNAL, 2015, 10 (01): : 195 - 211
  • [4] Analysis of semi-parametric regression models with non-ignorable non-response
    Rotnitzky, A
    Robins, J
    STATISTICS IN MEDICINE, 1997, 16 (1-3) : 81 - 102
  • [5] Robust growth mixture models with non-ignorable missingness: Models, estimation, selection, and application
    Lu, Zhenqiu
    Zhang, Zhiyong
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 220 - 240
  • [6] Semi-parametric Selection Models for Potentially Non-ignorable Attrition in Panel Studies with Refreshment Samples
    Si, Yajuan
    Reiter, Jerome P.
    Hillygus, D. Sunshine
    POLITICAL ANALYSIS, 2015, 23 (01) : 92 - 112
  • [7] Bayesian Models for Analysis of Inventory and Monitoring Data with Non-ignorable Missingness
    Luke J. Zachmann
    Erin M. Borgman
    Dana L. Witwicki
    Megan C. Swan
    Cheryl McIntyre
    N. Thompson Hobbs
    Journal of Agricultural, Biological and Environmental Statistics, 2022, 27 : 125 - 148
  • [8] Bayesian Models for Analysis of Inventory and Monitoring Data with Non-ignorable Missingness
    Zachmann, Luke J.
    Borgman, Erin M.
    Witwicki, Dana L.
    Swan, Megan C.
    McIntyre, Cheryl
    Hobbs, N. Thompson
    JOURNAL OF AGRICULTURAL BIOLOGICAL AND ENVIRONMENTAL STATISTICS, 2022, 27 (01) : 125 - 148
  • [9] A reanalysis of a longitudinal scleroderma clinical trial using non-ignorable missingness models
    Boscardin, W. John
    Yan, Xiaohong
    Wong, Weng Kee
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2007, 137 (12) : 3848 - 3858
  • [10] Semi-parametric estimation for conditional independence multivariate finite mixture models
    Chauveau, Didier
    Hunter, David R.
    Levine, Michael
    STATISTICS SURVEYS, 2015, 9 : 1 - 31