Model-Based Clustering for Conditionally Correlated Categorical Data

被引:0
|
作者
Matthieu Marbac
Christophe Biernacki
Vincent Vandewalle
机构
[1] Inria Lille and DGA,
[2] University Lille 1,undefined
[3] CNRS and Inria,undefined
[4] University Lille 2 and Inria,undefined
来源
Journal of Classification | 2015年 / 32卷
关键词
Categorical data; Clustering; Correlation; Expectation-Maximization algorithm; Gibbs sampler; Mixture model; Model selection.;
D O I
暂无
中图分类号
学科分类号
摘要
An extension of the latent class model is presented for clustering categorical data by relaxing the classical “class conditional independence assumption” of variables. This model consists in grouping the variables into inter-independent and intra-dependent blocks, in order to consider the main intra-class correlations. The dependency between variables grouped inside the same block of a class is taken into account by mixing two extreme distributions, which are respectively the independence and the maximum dependency. When the variables are dependent given the class, this approach is expected to reduce the biases of the latent class model. Indeed, it produces a meaningful dependency model with only a few additional parameters. The parameters are estimated, by maximum likelihood, by means of an EM algorithm. Moreover, a Gibbs sampler is used for model selection in order to overcome the computational intractability of the combinatorial problems involved by the block structure search. Two applications on medical and biological data sets show the relevance of this new model. The results strengthen the view that this model is meaningful and that it reduces the biases induced by the conditional independence assumption of the latent class model.
引用
收藏
页码:145 / 175
页数:30
相关论文
共 50 条
  • [41] Model-Based Clustering of Mixed Data With Sparse Dependence
    Choi, Young-Geun
    Ahn, Soohyun
    Kim, Jayoun
    IEEE ACCESS, 2023, 11 : 75945 - 75954
  • [42] Model-based clustering of Gaussian copulas for mixed data
    Marbac, Matthieu
    Biernacki, Christophe
    Vandewalle, Vincent
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2017, 46 (23) : 11635 - 11656
  • [43] Penalized model-based clustering of complex functional data
    Nicola Pronello
    Rosaria Ignaccolo
    Luigi Ippoliti
    Sara Fontanella
    Statistics and Computing, 2023, 33
  • [44] Penalized model-based clustering of complex functional data
    Pronello, Nicola
    Ignaccolo, Rosaria
    Ippoliti, Luigi
    Fontanella, Sara
    STATISTICS AND COMPUTING, 2023, 33 (06)
  • [45] Scalable model-based clustering by working on data summaries
    Jin, HD
    Wong, ML
    Leung, KS
    THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 91 - 98
  • [46] Probabilistic model-based clustering of multivariate and sequential data
    Smyth, P
    ARTIFICIAL INTELLIGENCE AND STATISTICS 99, PROCEEDINGS, 1999, : 299 - 304
  • [47] Model-based clustering for multivariate partial ranking data
    Jacques, Julien
    Biernacki, Christophe
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2014, 149 : 201 - 217
  • [48] Model-based co-clustering for ordinal data
    Jacques, Julien
    Biernacki, Christophe
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 123 : 101 - 115
  • [49] Model-Based Clustering of Inhomogeneous Paired Comparison Data
    Busse, Ludwig M.
    Buhmann, Joachim M.
    SIMILARITY-BASED PATTERN RECOGNITION, 2011, 7005 : 207 - 221
  • [50] Model-based co-clustering for functional data
    Ben Slimen, Yosra
    Allio, Sylvain
    Jacques, Julien
    NEUROCOMPUTING, 2018, 291 : 97 - 108