共 29 条
Clustering data with non-ignorable missingness using semi-parametric mixture models assuming independence within components
被引:4
|作者:
de Chaumaray, Marie du Roy
[1
]
Marbac, Matthieu
[1
]
机构:
[1] Univ Rennes, Ensai, CNRS, CREST UMR 9194, F-35000 Rennes, France
关键词:
Clustering;
Mixture model;
Non-ignorable missingness;
Smoothed likelihood;
NONPARAMETRIC-ESTIMATION;
MULTIVARIATE;
DISTRIBUTIONS;
LIKELIHOOD;
D O I:
10.1007/s11634-023-00534-w
中图分类号:
O21 [概率论与数理统计];
C8 [统计学];
学科分类号:
020208 ;
070103 ;
0714 ;
摘要:
We propose a semi-parametric clustering model assuming conditional independence given the component. One advantage is that this model can handle non-ignorable missingness. The model defines each component as a product of univariate probability distributions but makes no assumption on the form of each univariate density. Note that the mixture model is used for clustering but not for estimating the density of the full variables (observed and unobserved). Estimation is performed by maximizing an extension of the smoothed likelihood allowing missingness. This optimization is achieved by a Majorization-Minorization algorithm. We illustrate the relevance of our approach by numerical experiments conducted on simulated data. Under mild assumptions, we show the identifiability of the model defining the distribution of the observed data and the monotonicity of the algorithm. We also propose an extension of this new method to the case of mixed-type data that we illustrate on a real data set. The proposed method is implemented in the R package MNARclust available on CRAN.
引用
收藏
页码:1081 / 1122
页数:42
相关论文