Interactive information bottleneck for high-dimensional co-occurrence data clustering

被引:5
|
作者
Hu, Shizhe [1 ]
Wang, Ruobin [1 ]
Ye, Yangdong [1 ]
机构
[1] Zhengzhou Univ, Sch Informat Engn, Zhengzhou 450001, Peoples R China
基金
中国国家自然科学基金;
关键词
Clustering; High-dimensional data; Information bottleneck; MIXTURE MODEL; FEATURE-SELECTION;
D O I
10.1016/j.asoc.2021.107837
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering high-dimensional data is quite challenging due to lots of redundant and irrelevant information contained in features. Most existing methods sequentially or jointly perform the feature dimensionality reduction and data clustering on the low-dimensional representations. However, the relationships between the clustered data points and the dimension-reduced features, as well as the influence of the relationships on the low-dimensional feature subspace learning are neglected in these methods. In this paper, an embarrassingly simple yet effective interactive information bottleneck (IIB) method is proposed for high-dimensional co-occurrence data clustering by simultaneously performing data clustering and low-dimensional feature subspace learning. What is different from existing methods is that, we perform data clustering while maximally preserving the correlations between the data clusters and the learned dimension-reduced features, and simultaneously learn the low-dimensional feature subspace while maintaining the correlations with the data clustering results obtained in the previous iteration. Thus, the two stages are interactive and refined mutually. Finally, a new twin "draw-and-merge" method is designed for optimization. Experimental results on four high-dimensional datasets demonstrate the superiority and effectiveness of the proposed method. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] ICEAGE: Interactive clustering and exploration of large and high-dimensional geodata
    Guo, DS
    Peuquet, DJ
    Gahegan, M
    GEOINFORMATICA, 2003, 7 (03) : 229 - 253
  • [32] Information bottleneck theory of high-dimensional regression: relevancy, efficiency and optimality
    Ngampruetikorn, Vudtiwat
    Schwab, David J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [33] Matrix Factorization and Prediction for High-Dimensional Co-Occurrence Count Data via Shared Parameter Alternating Zero Inflated Gamma Model
    Kim, Taejoon
    Wang, Haiyan
    MATHEMATICS, 2024, 12 (21)
  • [34] Interactive Visualization of High-Dimensional Petascale Ocean Data
    Ellsworth, David A.
    Henze, Christopher E.
    Nelson, Bron C.
    2017 IEEE 7TH SYMPOSIUM ON LARGE DATA ANALYSIS AND VISUALIZATION (LDAV), 2017, : 36 - 44
  • [35] An efficient clustering method of data mining for high-dimensional data
    Chang, JW
    Kang, HM
    8TH WORLD MULTI-CONFERENCE ON SYSTEMICS, CYBERNETICS AND INFORMATICS, VOL II, PROCEEDINGS: COMPUTING TECHNIQUES, 2004, : 273 - 278
  • [36] High-dimensional clustering method for high performance data mining
    Chang, Jae-Woo
    Lee, Hyun-Jo
    COMPUTATIONAL SCIENCE - ICCS 2007, PT 3, PROCEEDINGS, 2007, 4489 : 621 - +
  • [37] Co-Occurrence Estimation from Aggregated Data with Auxiliary Information
    Iwata, Tomoharu
    Marumo, Naoki
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 4247 - 4254
  • [38] Bayesian variable selection in clustering high-dimensional data
    Tadesse, MG
    Sha, N
    Vannucci, M
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2005, 100 (470) : 602 - 617
  • [39] Subspace Clustering of High-Dimensional Data: An Evolutionary Approach
    Vijendra, Singh
    Laxman, Sahoo
    APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2013, 2013
  • [40] Functional clustering algorithm for high-dimensional proteomics data
    Bensmail, H
    Aruna, B
    Semmes, OJ
    Haoudi, A
    JOURNAL OF BIOMEDICINE AND BIOTECHNOLOGY, 2005, (02): : 80 - 86