Interactive information bottleneck for high-dimensional co-occurrence data clustering

被引:5
|
作者
Hu, Shizhe [1 ]
Wang, Ruobin [1 ]
Ye, Yangdong [1 ]
机构
[1] Zhengzhou Univ, Sch Informat Engn, Zhengzhou 450001, Peoples R China
基金
中国国家自然科学基金;
关键词
Clustering; High-dimensional data; Information bottleneck; MIXTURE MODEL; FEATURE-SELECTION;
D O I
10.1016/j.asoc.2021.107837
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering high-dimensional data is quite challenging due to lots of redundant and irrelevant information contained in features. Most existing methods sequentially or jointly perform the feature dimensionality reduction and data clustering on the low-dimensional representations. However, the relationships between the clustered data points and the dimension-reduced features, as well as the influence of the relationships on the low-dimensional feature subspace learning are neglected in these methods. In this paper, an embarrassingly simple yet effective interactive information bottleneck (IIB) method is proposed for high-dimensional co-occurrence data clustering by simultaneously performing data clustering and low-dimensional feature subspace learning. What is different from existing methods is that, we perform data clustering while maximally preserving the correlations between the data clusters and the learned dimension-reduced features, and simultaneously learn the low-dimensional feature subspace while maintaining the correlations with the data clustering results obtained in the previous iteration. Thus, the two stages are interactive and refined mutually. Finally, a new twin "draw-and-merge" method is designed for optimization. Experimental results on four high-dimensional datasets demonstrate the superiority and effectiveness of the proposed method. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Information theoretic clustering of sparse co-occurrence data
    Dhillon, IS
    Guan, YQ
    THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 517 - 520
  • [2] Producing high-dimensional semantic spaces from lexical co-occurrence
    Lund, K
    Burgess, C
    BEHAVIOR RESEARCH METHODS INSTRUMENTS & COMPUTERS, 1996, 28 (02): : 203 - 208
  • [3] High-dimensional data clustering
    Bouveyron, C.
    Girard, S.
    Schmid, C.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (01) : 502 - 519
  • [4] Clustering High-Dimensional Data
    Masulli, Francesco
    Rovetta, Stefano
    CLUSTERING HIGH-DIMENSIONAL DATA, CHDD 2012, 2015, 7627 : 1 - 13
  • [5] Unsupervised Discovery of Co-occurrence in Sparse High Dimensional Data
    Chum, Ondrej
    Matas, Jiri
    2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 3416 - 3423
  • [6] Word frequency effects in high-dimensional co-occurrence models: A new approach
    Cyrus Shaoul
    Chris Westbury
    Behavior Research Methods, 2006, 38 : 190 - 195
  • [7] Word frequency effects in high-dimensional co-occurrence models: A new approach
    Shaoul, Cyrus
    Westbury, Chris
    BEHAVIOR RESEARCH METHODS, 2006, 38 (02) : 190 - 195
  • [8] Fuzzy Clustering High-Dimensional Data Using Information Weighting
    Bodyanskiy, Yevgeniy V.
    Tyshchenko, Oleksii K.
    Mashtalir, Sergii V.
    ARTIFICIAL INTELLIGENCEAND SOFT COMPUTING, PT I, 2019, 11508 : 385 - 395
  • [9] Multi-View Information-Theoretic Co-Clustering for Co-Occurrence Data
    Xu, Peng
    Deng, Zhaohong
    Choi, Kup-Sze
    Cao, Longbing
    Wang, Shitong
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 379 - 386
  • [10] ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data
    Manjunath, Mohith
    Zhang, Yi
    Kim, Yeonsung
    Yeo, Steve H.
    Sobh, Omar
    Russell, Nathan
    Followell, Christian
    Bushell, Colleen
    Ravaioli, Umberto
    Song, Jun S.
    PEERJ COMPUTER SCIENCE, 2018,