Interactive information bottleneck for high-dimensional co-occurrence data clustering

被引:5
|
作者
Hu, Shizhe [1 ]
Wang, Ruobin [1 ]
Ye, Yangdong [1 ]
机构
[1] Zhengzhou Univ, Sch Informat Engn, Zhengzhou 450001, Peoples R China
基金
中国国家自然科学基金;
关键词
Clustering; High-dimensional data; Information bottleneck; MIXTURE MODEL; FEATURE-SELECTION;
D O I
10.1016/j.asoc.2021.107837
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering high-dimensional data is quite challenging due to lots of redundant and irrelevant information contained in features. Most existing methods sequentially or jointly perform the feature dimensionality reduction and data clustering on the low-dimensional representations. However, the relationships between the clustered data points and the dimension-reduced features, as well as the influence of the relationships on the low-dimensional feature subspace learning are neglected in these methods. In this paper, an embarrassingly simple yet effective interactive information bottleneck (IIB) method is proposed for high-dimensional co-occurrence data clustering by simultaneously performing data clustering and low-dimensional feature subspace learning. What is different from existing methods is that, we perform data clustering while maximally preserving the correlations between the data clusters and the learned dimension-reduced features, and simultaneously learn the low-dimensional feature subspace while maintaining the correlations with the data clustering results obtained in the previous iteration. Thus, the two stages are interactive and refined mutually. Finally, a new twin "draw-and-merge" method is designed for optimization. Experimental results on four high-dimensional datasets demonstrate the superiority and effectiveness of the proposed method. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Density Conscious Subspace Clustering for High-Dimensional Data
    Chu, Yi-Hong
    Huang, Jen-Wei
    Chuang, Kun-Ta
    Yang, De-Nian
    Chen, Ming-Syan
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (01) : 16 - 30
  • [42] Evolutionary Subspace Clustering Algorithm for High-Dimensional Data
    Nourashrafeddin, S. N.
    Arnold, Dirk V.
    Milios, Evangelos
    PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION COMPANION (GECCO'12), 2012, : 1497 - 1498
  • [43] Subspace clustering of high-dimensional data: a predictive approach
    Brian McWilliams
    Giovanni Montana
    Data Mining and Knowledge Discovery, 2014, 28 : 736 - 772
  • [44] Integrative clustering methods for high-dimensional molecular data
    Chalise, Prabhakar
    Koestler, Devin C.
    Bimali, Milan
    Yu, Qing
    Fridley, Brooke L.
    TRANSLATIONAL CANCER RESEARCH, 2014, 3 (03) : 202 - 216
  • [45] Iterative random projections for high-dimensional data clustering
    Cardoso, Angelo
    Wichert, Andreas
    PATTERN RECOGNITION LETTERS, 2012, 33 (13) : 1749 - 1755
  • [46] Clustering high-dimensional data via feature selection
    Liu, Tianqi
    Lu, Yu
    Zhu, Biqing
    Zhao, Hongyu
    BIOMETRICS, 2023, 79 (02) : 940 - 950
  • [47] Clustering algorithm of high-dimensional data based on units
    School of In formation Engineering, Hubei Institute for Nationalities, Enshi 445000, China
    Jisuanji Yanjiu yu Fazhan, 2007, 9 (1618-1623): : 1618 - 1623
  • [48] A Hierarchical Model-based Approach to Co-Clustering High-Dimensional Data
    Costa, Gianni
    Manco, Giuseppe
    Ortale, Riccardo
    APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 886 - 890
  • [49] Multiview Spectral Clustering of High-Dimensional Observational Data
    Roman-Messina, A.
    Castro-Arvizu, Claudia M.
    Castillo-Tapia, Alejandro
    Murillo-Aguirre, Erlan R.
    Rodriguez-Villalon, O.
    IEEE ACCESS, 2023, 11 : 115884 - 115893
  • [50] Fuzzy nearest neighbor clustering of high-dimensional data
    Wang, HB
    Yu, YQ
    Zhou, DR
    Meng, B
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2569 - 2572