HICC: an entropy splitting-based framework for hierarchical co-clustering

被引:0
|
作者
Wei Cheng
Xiang Zhang
Feng Pan
Wei Wang
机构
[1] University of North Carolina at Chapel Hill,Department of Computer Science
[2] Case Western Reserve University,Department of Electrical Engineering and Computer Science
[3] Microsoft,Department of Computer Science
[4] University of California,undefined
来源
关键词
Co-clustering; Entropy; Contingency table; Text analysis;
D O I
暂无
中图分类号
学科分类号
摘要
Two-dimensional contingency tables or co-occurrence matrices arise frequently in various important applications such as text analysis and web-log mining. As a fundamental research topic, co-clustering aims to generate a meaningful partition of the contingency table to reveal hidden relationships between rows and columns. Traditional co-clustering algorithms usually produce a predefined number of flat partition of both rows and columns, which do not reveal relationship among clusters. To address this limitation, hierarchical co-clustering algorithms have attracted a lot of research interests recently. Although successful in various applications, the existing hierarchical co-clustering algorithms are usually based on certain heuristics and do not have solid theoretical background. In this paper, we present a new co-clustering algorithm, HICC, with solid theoretical background. It simultaneously constructs a hierarchical structure of both row and column clusters, which retains sufficient mutual information between rows and columns of the contingency table. An efficient and effective greedy algorithm is developed, which grows a co-cluster hierarchy by successively performing row-wise or column-wise splits that lead to the maximal mutual information gain. Extensive experiments on both synthetic and real datasets demonstrate that our algorithm can reveal essential relationships of row (and column) clusters and has better clustering precision than existing algorithms. Moreover, the experiments on real dataset show that HICC can effectively reveal hidden relationships between rows and columns in the contingency table.
引用
收藏
页码:343 / 367
页数:24
相关论文
共 50 条
  • [21] A generalized maximum entropy approach to Bregman Co-clustering and matrix approximation
    Banerjee, Arindam
    Dhillon, Inderjit
    Ghosh, Joydeep
    Merugu, Srujana
    Modha, Dharmendra S.
    Journal of Machine Learning Research, 2007, 8 : 1919 - 1986
  • [22] Co-clustering neighborhood—based collaborative filtering framework using formal concept analysis
    Kataria S.
    Batra U.
    International Journal of Information Technology, 2022, 14 (4) : 1725 - 1731
  • [23] Geosocial Co-Clustering: A Novel Framework for Geosocial Community Detection
    Kim, Jungeun
    Lee, Jae-Gil
    Lee, Byung Suk
    Liu, Jiajun
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2020, 11 (04)
  • [24] A hierarchical co-clustering approach for entity exploration over Linked Data
    Zheng, Liang
    Qu, Yuzhong
    Qian, Xinqi
    Cheng, Gong
    KNOWLEDGE-BASED SYSTEMS, 2018, 141 : 200 - 210
  • [25] A hierarchical co-clustering algorithm for high-order heterogeneous data
    Yang, Xinxin
    Huang, Shaobin
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (01): : 200 - 210
  • [26] Hierarchical high-order co-clustering algorithm by maximizing modularity
    Wei, Jiahui
    Ma, Huifang
    Liu, Yuhang
    Li, Zhixin
    Li, Ning
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (10) : 2887 - 2898
  • [27] HIERARCHICAL INFORMATION-THEORETIC CO-CLUSTERING FOR HIGH DIMENSIONAL DATA
    Wang, Yuanyuan
    Ye, Yunming
    Li, Xutao
    Ng, Michael K.
    Huang, Joshua
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2011, 7 (01): : 487 - 500
  • [28] Hierarchical high-order co-clustering algorithm by maximizing modularity
    Jiahui Wei
    Huifang Ma
    Yuhang Liu
    Zhixin Li
    Ning Li
    International Journal of Machine Learning and Cybernetics, 2021, 12 : 2887 - 2898
  • [29] SemiNMF-PCA framework for Sparse Data Co-clustering
    Allab, Kais
    Labiod, Lazhar
    Nadif, Mohamed
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 347 - 356
  • [30] A generalized maximum entropy approach to Bregman co-clustering and matrix approximation
    Banerjee, Arindam
    Dhillon, Inderjit
    Ghosh, Joydeep
    Merugu, Srujana
    Modha, Dharmendra S.
    JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 1919 - 1986