A novel minorization-maximization framework for simultaneous feature selection and clustering of high-dimensional count data

被引:1
|
作者
Zamzami, Nuha [1 ]
Bouguila, Nizar [2 ]
机构
[1] Univ Jeddah, Coll Comp Sci & Engn, Dept Comp Sci & Artificial Intelligence, Jeddah, Saudi Arabia
[2] Concordia Univ, Concordia Inst Informat Syst Engn CIISE, Montreal, PQ, Canada
关键词
Feature saliency; Feature selection; Model selection; Unsupervised learning; Count data; Mixture models; Generalized Dirichlet multinomial; Maximum likelihood; Minorization-maximization; UNSUPERVISED FEATURE-SELECTION; DISCRIMINANT-ANALYSIS; MAXIMUM-LIKELIHOOD; MODEL SELECTION; ALGORITHM; CLASSIFICATION; MIXTURES;
D O I
10.1007/s10044-022-01094-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Count data are commonly exploited in machine learning and computer vision applications; however, they often suffer from the well-known curse of dimensionality, which declines the performance of clustering algorithms dramatically. Feature selection is a major technique for handling a large number of features, which most are often redundant and noisy. In this paper, we propose a probabilistic approach for count data based on the concept of feature saliency in the context of mixture-based clustering using the generalized Dirichlet multinomial distribution. The saliency of irrelevant features is reduced toward zero by minimizing the message length, which equates to doing feature and model selection simultaneously. It is proved that the developed approach is effective in identifying both the optimal number of clusters and the most important features, and so enhancing clustering performance significantly, using a range of challenging applications including text and image clustering.
引用
收藏
页码:91 / 106
页数:16
相关论文
共 50 条
  • [1] A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count data
    Nuha Zamzami
    Nizar Bouguila
    Pattern Analysis and Applications, 2023, 26 : 91 - 106
  • [2] Mixture-based clustering for count data using approximated Fisher Scoring and Minorization-Maximization approaches
    Bregu, Ornela
    Zamzami, Nuha
    Bouguila, Nizar
    COMPUTATIONAL INTELLIGENCE, 2021, 37 (01) : 596 - 620
  • [3] Clustering high-dimensional data via feature selection
    Liu, Tianqi
    Lu, Yu
    Zhu, Biqing
    Zhao, Hongyu
    BIOMETRICS, 2023, 79 (02) : 940 - 950
  • [4] Simultaneous Feature and Model Selection for High-Dimensional Data
    Perolini, Alessandro
    Guerif, Sebastien
    2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011), 2011, : 47 - 50
  • [5] Simultaneous Feature Selection and Classification for High-Dimensional Data
    Pai, Vriddhi
    Gupta, Subhash Chand
    PROCEEDINGS OF THE SECOND INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND INTERNET OF THINGS (ICGCIOT 2018), 2018, : 153 - 158
  • [6] On online high-dimensional spherical data clustering and feature selection
    Amayri, Ola
    Bouguila, Nizar
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2013, 26 (04) : 1386 - 1398
  • [7] A general framework of nonparametric feature selection in high-dimensional data
    Yu, Hang
    Wang, Yuanjia
    Zeng, Donglin
    BIOMETRICS, 2023, 79 (02) : 951 - 963
  • [8] A GA-based Feature Selection for High-dimensional Data Clustering
    Sun, Mei
    Xiong, Langhuan
    Sun, Haojun
    Jiang, Dazhi
    THIRD INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTING, 2009, : 769 - 772
  • [9] Feature selection for high-dimensional data
    Bolón-Canedo V.
    Sánchez-Maroño N.
    Alonso-Betanzos A.
    Progress in Artificial Intelligence, 2016, 5 (2) : 65 - 75
  • [10] Feature selection for high-dimensional data
    Destrero A.
    Mosci S.
    De Mol C.
    Verri A.
    Odone F.
    Computational Management Science, 2009, 6 (1) : 25 - 40