Clustering data with non-ignorable missingness using semi-parametric mixture models assuming independence within components

被引：4

作者：

de Chaumaray, Marie du Roy ^{[1
]}

Marbac, Matthieu ^{[1
]}

机构：

[1] Univ Rennes, Ensai, CNRS, CREST UMR 9194, F-35000 Rennes, France

来源：

ADVANCES IN DATA ANALYSIS AND CLASSIFICATION | 2023年 / 17卷 / 04期

关键词：

Clustering; Mixture model; Non-ignorable missingness; Smoothed likelihood; NONPARAMETRIC-ESTIMATION; MULTIVARIATE; DISTRIBUTIONS; LIKELIHOOD;

D O I：

10.1007/s11634-023-00534-w

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

We propose a semi-parametric clustering model assuming conditional independence given the component. One advantage is that this model can handle non-ignorable missingness. The model defines each component as a product of univariate probability distributions but makes no assumption on the form of each univariate density. Note that the mixture model is used for clustering but not for estimating the density of the full variables (observed and unobserved). Estimation is performed by maximizing an extension of the smoothed likelihood allowing missingness. This optimization is achieved by a Majorization-Minorization algorithm. We illustrate the relevance of our approach by numerical experiments conducted on simulated data. Under mild assumptions, we show the identifiability of the model defining the distribution of the observed data and the monotonicity of the algorithm. We also propose an extension of this new method to the case of mixed-type data that we illustrate on a real data set. The proposed method is implemented in the R package MNARclust available on CRAN.

引用

页码：1081 / 1122

页数：42

共 29 条

[21] Topics on Dynamic Panel Data Models with Random Effects Using Semi-Parametric Bayesian Approach
Kazemi, Iraj
Rikhtehgaran, Reyhaneh
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2014, 43 (08) : 1630 - 1648
[22] Analysis of cross-over designs with serial correlation within periods using semi-parametric mixed models
Maringwa, John T.
Geys, Helena
Shkedy, Ziv
Faes, Christel
Molenberghs, Geert
Aerts, Marc
Van Ammel, Karel
Teisman, Ard
Bijnens, Luc
STATISTICS IN MEDICINE, 2008, 27 (28) : 6009 - 6033
[23] One-shot device test data analysis using non-parametric and semi-parametric inferential methods and applications
Zhu, Xiaojun
Balakrishnan, N.
RELIABILITY ENGINEERING & SYSTEM SAFETY, 2022, 221
[24] How to analyse seed germination data using statistical time-to-event analysis: non-parametric and semi-parametric methods
McNair, James N.
Sunkara, Anusha
Frobish, Daniel
SEED SCIENCE RESEARCH, 2012, 22 (02) : 77 - 95
[25] Comparison of Canopy Closure Estimation of Plantations Using Parametric, Semi-Parametric, and Non-Parametric Models Based on GF-1 Remote Sensing Images
Li, Jiarui
Mao, Xuegang
FORESTS, 2020, 11 (05):
[26] Factors Associated With Short-Term and Long-Term Survival in Patients Undergoing Hemodialysis Using Semi-Parametric Mixture Cure Models
Shayan, Zahra
Ebrahimi, Vahid
Jahromi, Shahrokh Ezzatzadegan
IRANIAN JOURNAL OF KIDNEY DISEASES, 2020, 14 (01) : 44 - 51
[27] Optimal subsampling for semi-parametric accelerated failure time models with massive survival data using a rank-based approach
Yang, Zehan
Wang, Haiying
Yan, Jun
STATISTICS IN MEDICINE, 2024, 43 (24) : 4650 - 4666
[28] The study of long-term HIV dynamics using semi-parametric non-linear mixed-effects models
Wu, HL
Zhang, JT
STATISTICS IN MEDICINE, 2002, 21 (23) : 3655 - 3675
[29] Multi-subgroup gene screening using semi-parametric hierarchical mixture models and the optimal discovery procedure: Application to a randomized clinical trial in multiple myeloma
Matsui, Shigeyuki
Noma, Hisashi
Qu, Pingping
Sakai, Yoshio
Matsui, Kota
Heuck, Christoph
Crowley, John
BIOMETRICS, 2018, 74 (01) : 313 - 320

← 1 2 3 →