Model-Based Clustering for Conditionally Correlated Categorical Data

被引:0
|
作者
Matthieu Marbac
Christophe Biernacki
Vincent Vandewalle
机构
[1] Inria Lille and DGA,
[2] University Lille 1,undefined
[3] CNRS and Inria,undefined
[4] University Lille 2 and Inria,undefined
来源
Journal of Classification | 2015年 / 32卷
关键词
Categorical data; Clustering; Correlation; Expectation-Maximization algorithm; Gibbs sampler; Mixture model; Model selection.;
D O I
暂无
中图分类号
学科分类号
摘要
An extension of the latent class model is presented for clustering categorical data by relaxing the classical “class conditional independence assumption” of variables. This model consists in grouping the variables into inter-independent and intra-dependent blocks, in order to consider the main intra-class correlations. The dependency between variables grouped inside the same block of a class is taken into account by mixing two extreme distributions, which are respectively the independence and the maximum dependency. When the variables are dependent given the class, this approach is expected to reduce the biases of the latent class model. Indeed, it produces a meaningful dependency model with only a few additional parameters. The parameters are estimated, by maximum likelihood, by means of an EM algorithm. Moreover, a Gibbs sampler is used for model selection in order to overcome the computational intractability of the combinatorial problems involved by the block structure search. Two applications on medical and biological data sets show the relevance of this new model. The results strengthen the view that this model is meaningful and that it reduces the biases induced by the conditional independence assumption of the latent class model.
引用
收藏
页码:145 / 175
页数:30
相关论文
共 50 条
  • [31] Model-based clustering and data transformations for gene expression data
    Yeung, KY
    Fraley, C
    Murua, A
    Raftery, AE
    Ruzzo, WL
    BIOINFORMATICS, 2001, 17 (10) : 977 - 987
  • [32] Model-based clustering for RNA-seq data
    Si, Yaqing
    Liu, Peng
    Li, Pinghua
    Brutnell, Thomas P.
    BIOINFORMATICS, 2014, 30 (02) : 197 - 205
  • [33] Model-based clustering and outlier detection with missing data
    Hung Tong
    Cristina Tortora
    Advances in Data Analysis and Classification, 2022, 16 : 5 - 30
  • [34] Model-based clustering and analysis of life history data
    Scott, Marc A.
    Mohan, Kaushik
    Gauthier, Jacques-Antoine
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 2020, 183 (03) : 1231 - 1251
  • [35] Model-based clustering and outlier detection with missing data
    Tong, Hung
    Tortora, Cristina
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2022, 16 (01) : 5 - 30
  • [36] On Model-Based Clustering of Directional Data with Heavy Tails
    Yingying Zhang
    Volodymyr Melnykov
    Igor Melnykov
    Journal of Classification, 2023, 40 (3) : 527 - 551
  • [37] Bayesian model-based clustering for longitudinal ordinal data
    Roy Costilla
    Ivy Liu
    Richard Arnold
    Daniel Fernández
    Computational Statistics, 2019, 34 : 1015 - 1038
  • [38] BAYESIAN MODEL-BASED CLUSTERING FOR POPULATIONS OF NETWORK DATA
    Mantziou, Anastasia
    Lunagomez, Simon
    Mitra, Robin
    ANNALS OF APPLIED STATISTICS, 2024, 18 (01): : 266 - 302
  • [39] Model-Based Clustering of Inhomogeneous Paired Comparison Data
    Busse, Ludwig M.
    Buhmann, Joachim M.
    SIMILARITY-BASED PATTERN RECOGNITION: FIRST INTERNATIONAL WORKSHOP, SIMBAD 2011, 2011, 7005 : 207 - 221
  • [40] Cloud Model-based Data Attributes Reduction for Clustering
    Xu Ru-zhi
    Nie Pei-yao
    Lin Pei-guang
    Chu Dong-sheng
    PROCEEDINGS OF THE INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY, 2008, : 33 - 36