Interactive information bottleneck for high-dimensional co-occurrence data clustering

被引:5
|
作者
Hu, Shizhe [1 ]
Wang, Ruobin [1 ]
Ye, Yangdong [1 ]
机构
[1] Zhengzhou Univ, Sch Informat Engn, Zhengzhou 450001, Peoples R China
基金
中国国家自然科学基金;
关键词
Clustering; High-dimensional data; Information bottleneck; MIXTURE MODEL; FEATURE-SELECTION;
D O I
10.1016/j.asoc.2021.107837
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering high-dimensional data is quite challenging due to lots of redundant and irrelevant information contained in features. Most existing methods sequentially or jointly perform the feature dimensionality reduction and data clustering on the low-dimensional representations. However, the relationships between the clustered data points and the dimension-reduced features, as well as the influence of the relationships on the low-dimensional feature subspace learning are neglected in these methods. In this paper, an embarrassingly simple yet effective interactive information bottleneck (IIB) method is proposed for high-dimensional co-occurrence data clustering by simultaneously performing data clustering and low-dimensional feature subspace learning. What is different from existing methods is that, we perform data clustering while maximally preserving the correlations between the data clusters and the learned dimension-reduced features, and simultaneously learn the low-dimensional feature subspace while maintaining the correlations with the data clustering results obtained in the previous iteration. Thus, the two stages are interactive and refined mutually. Finally, a new twin "draw-and-merge" method is designed for optimization. Experimental results on four high-dimensional datasets demonstrate the superiority and effectiveness of the proposed method. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] Clustering High-Dimensional Noisy Categorical Data
    Tian, Zhiyi
    Xu, Jiaming
    Tang, Jen
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (548) : 3008 - 3019
  • [22] Subspace selection for clustering high-dimensional data
    Baumgartner, C
    Plant, C
    Kailing, K
    Kriegel, HP
    Kröger, P
    FOURTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2004, : 11 - 18
  • [23] The Role of Hubness in Clustering High-Dimensional Data
    Tomasev, Nenad
    Radovanovic, Milos
    Mladenic, Dunja
    Ivanovic, Mirjana
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (03) : 739 - 751
  • [24] An Initialization Method for Clustering High-Dimensional Data
    Chen, Luying
    Chen, Lifei
    Jiang, Qingshan
    Wang, Beizhan
    Shi, Liang
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 444 - +
  • [25] Clustering of imbalanced high-dimensional media data
    Brodinova, Sarka
    Zaharieva, Maia
    Filzmoser, Peter
    Ortner, Thomas
    Breiteneder, Christian
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2018, 12 (02) : 261 - 284
  • [26] Clustering of imbalanced high-dimensional media data
    Šárka Brodinová
    Maia Zaharieva
    Peter Filzmoser
    Thomas Ortner
    Christian Breiteneder
    Advances in Data Analysis and Classification, 2018, 12 : 261 - 284
  • [27] The Role of Hubness in Clustering High-Dimensional Data
    Tomasev, Nenad
    Radovanovic, Milos
    Mladenic, Dunja
    Ivanovic, Mirjana
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT I: 15TH PACIFIC-ASIA CONFERENCE, PAKDD 2011, 2011, 6634 : 183 - 195
  • [28] An effective clustering scheme for high-dimensional data
    Xuansen He
    Fan He
    Yueping Fan
    Lingmin Jiang
    Runzong Liu
    Allam Maalla
    Multimedia Tools and Applications, 2024, 83 : 45001 - 45045
  • [29] ICEAGE: Interactive Clustering and Exploration of Large and High-Dimensional Geodata
    Diansheng Guo
    Donna J. Peuquet
    Mark Gahegan
    GeoInformatica, 2003, 7 : 229 - 253
  • [30] An algorithm for high-dimensional traffic data clustering
    Zheng, Pengjun
    McDonald, Mike
    FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4223 : 59 - 68