HICC: an entropy splitting-based framework for hierarchical co-clustering

被引:0
|
作者
Wei Cheng
Xiang Zhang
Feng Pan
Wei Wang
机构
[1] University of North Carolina at Chapel Hill,Department of Computer Science
[2] Case Western Reserve University,Department of Electrical Engineering and Computer Science
[3] Microsoft,Department of Computer Science
[4] University of California,undefined
来源
关键词
Co-clustering; Entropy; Contingency table; Text analysis;
D O I
暂无
中图分类号
学科分类号
摘要
Two-dimensional contingency tables or co-occurrence matrices arise frequently in various important applications such as text analysis and web-log mining. As a fundamental research topic, co-clustering aims to generate a meaningful partition of the contingency table to reveal hidden relationships between rows and columns. Traditional co-clustering algorithms usually produce a predefined number of flat partition of both rows and columns, which do not reveal relationship among clusters. To address this limitation, hierarchical co-clustering algorithms have attracted a lot of research interests recently. Although successful in various applications, the existing hierarchical co-clustering algorithms are usually based on certain heuristics and do not have solid theoretical background. In this paper, we present a new co-clustering algorithm, HICC, with solid theoretical background. It simultaneously constructs a hierarchical structure of both row and column clusters, which retains sufficient mutual information between rows and columns of the contingency table. An efficient and effective greedy algorithm is developed, which grows a co-cluster hierarchy by successively performing row-wise or column-wise splits that lead to the maximal mutual information gain. Extensive experiments on both synthetic and real datasets demonstrate that our algorithm can reveal essential relationships of row (and column) clusters and has better clustering precision than existing algorithms. Moreover, the experiments on real dataset show that HICC can effectively reveal hidden relationships between rows and columns in the contingency table.
引用
收藏
页码:343 / 367
页数:24
相关论文
共 50 条
  • [41] Co-Clustering Ensembles Based on Multiple Relevance Measures
    Yu, Xianxue
    Yu, Guoxian
    Wang, Jun
    Domeniconi, Carlotta
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (04) : 1389 - 1400
  • [42] Fast parameterless prototype-based co-clustering
    Battaglia, Elena
    Peiretti, Federico
    Pensa, Ruggero G.
    MACHINE LEARNING, 2024, 113 (04) : 2153 - 2181
  • [43] Co-Clustering Based Approach for Indian Monsoon Prediction
    Saha, Moumita
    Mitra, Pabitra
    INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2015 COMPUTATIONAL SCIENCE AT THE GATES OF NATURE, 2015, 51 : 2938 - 2942
  • [44] Fast parameterless prototype-based co-clustering
    Elena Battaglia
    Federico Peiretti
    Ruggero G. Pensa
    Machine Learning, 2024, 113 : 2153 - 2181
  • [45] A co-clustering algorithm based on structured Web document
    Deng, Dong-Mei
    Long, Ji-Zhen
    Yin, Xiang-Zhou
    Zhongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Central South University (Science and Technology), 2010, 41 (05): : 1871 - 1876
  • [46] A general framework for fast co-clustering on large datasets using matrix decomposition
    Pan, Feng
    Zhang, Xiang
    Wang, Wei
    2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 1337 - 1339
  • [47] Linking Multiple Online Identities in Criminal Investigations: A Spectral Co-Clustering Framework
    Han, Xiaohui
    Wang, Lianhai
    Cui, Chaoran
    Ma, Jun
    Zhang, Shuhui
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2017, 12 (09) : 2242 - 2255
  • [49] Knowledge-Supervised Learning by Co-clustering Based Approach
    Zhang, Congle
    Xing, Dikan
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 773 - 776
  • [50] Co-clustering based classification of multi-view data
    Syed Fawad Hussain
    Mohsin Khan
    Imran Siddiqi
    Applied Intelligence, 2022, 52 : 14756 - 14772