Clustering data with non-ignorable missingness using semi-parametric mixture models assuming independence within components

被引:4
|
作者
de Chaumaray, Marie du Roy [1 ]
Marbac, Matthieu [1 ]
机构
[1] Univ Rennes, Ensai, CNRS, CREST UMR 9194, F-35000 Rennes, France
关键词
Clustering; Mixture model; Non-ignorable missingness; Smoothed likelihood; NONPARAMETRIC-ESTIMATION; MULTIVARIATE; DISTRIBUTIONS; LIKELIHOOD;
D O I
10.1007/s11634-023-00534-w
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We propose a semi-parametric clustering model assuming conditional independence given the component. One advantage is that this model can handle non-ignorable missingness. The model defines each component as a product of univariate probability distributions but makes no assumption on the form of each univariate density. Note that the mixture model is used for clustering but not for estimating the density of the full variables (observed and unobserved). Estimation is performed by maximizing an extension of the smoothed likelihood allowing missingness. This optimization is achieved by a Majorization-Minorization algorithm. We illustrate the relevance of our approach by numerical experiments conducted on simulated data. Under mild assumptions, we show the identifiability of the model defining the distribution of the observed data and the monotonicity of the algorithm. We also propose an extension of this new method to the case of mixed-type data that we illustrate on a real data set. The proposed method is implemented in the R package MNARclust available on CRAN.
引用
收藏
页码:1081 / 1122
页数:42
相关论文
共 29 条
  • [21] Topics on Dynamic Panel Data Models with Random Effects Using Semi-Parametric Bayesian Approach
    Kazemi, Iraj
    Rikhtehgaran, Reyhaneh
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2014, 43 (08) : 1630 - 1648
  • [22] Analysis of cross-over designs with serial correlation within periods using semi-parametric mixed models
    Maringwa, John T.
    Geys, Helena
    Shkedy, Ziv
    Faes, Christel
    Molenberghs, Geert
    Aerts, Marc
    Van Ammel, Karel
    Teisman, Ard
    Bijnens, Luc
    STATISTICS IN MEDICINE, 2008, 27 (28) : 6009 - 6033
  • [23] One-shot device test data analysis using non-parametric and semi-parametric inferential methods and applications
    Zhu, Xiaojun
    Balakrishnan, N.
    RELIABILITY ENGINEERING & SYSTEM SAFETY, 2022, 221
  • [24] How to analyse seed germination data using statistical time-to-event analysis: non-parametric and semi-parametric methods
    McNair, James N.
    Sunkara, Anusha
    Frobish, Daniel
    SEED SCIENCE RESEARCH, 2012, 22 (02) : 77 - 95
  • [25] Comparison of Canopy Closure Estimation of Plantations Using Parametric, Semi-Parametric, and Non-Parametric Models Based on GF-1 Remote Sensing Images
    Li, Jiarui
    Mao, Xuegang
    FORESTS, 2020, 11 (05):
  • [26] Factors Associated With Short-Term and Long-Term Survival in Patients Undergoing Hemodialysis Using Semi-Parametric Mixture Cure Models
    Shayan, Zahra
    Ebrahimi, Vahid
    Jahromi, Shahrokh Ezzatzadegan
    IRANIAN JOURNAL OF KIDNEY DISEASES, 2020, 14 (01) : 44 - 51
  • [27] Optimal subsampling for semi-parametric accelerated failure time models with massive survival data using a rank-based approach
    Yang, Zehan
    Wang, Haiying
    Yan, Jun
    STATISTICS IN MEDICINE, 2024, 43 (24) : 4650 - 4666
  • [28] The study of long-term HIV dynamics using semi-parametric non-linear mixed-effects models
    Wu, HL
    Zhang, JT
    STATISTICS IN MEDICINE, 2002, 21 (23) : 3655 - 3675
  • [29] Multi-subgroup gene screening using semi-parametric hierarchical mixture models and the optimal discovery procedure: Application to a randomized clinical trial in multiple myeloma
    Matsui, Shigeyuki
    Noma, Hisashi
    Qu, Pingping
    Sakai, Yoshio
    Matsui, Kota
    Heuck, Christoph
    Crowley, John
    BIOMETRICS, 2018, 74 (01) : 313 - 320