Parameter-less co-clustering for star-structured heterogeneous data

被引:39
|
作者
Ienco, Dino [1 ,3 ]
Robardet, Celine [2 ]
Pensa, Ruggero G. [1 ]
Meo, Rosa [1 ]
机构
[1] Univ Turin, Dept Comp Sci, I-10139 Turin, Italy
[2] Univ Lyon, CNRS, INSA Lyon, LIRIS UMR5205, F-69621 Villeurbanne, France
[3] IRSTEA Montpellier, UMR TETIS, F-34093 Montpellier, France
关键词
Co-clustering; Star-structured data; Multi-view data; LOCAL SEARCH; ALGORITHMS;
D O I
10.1007/s10618-012-0248-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The availability of data represented with multiple features coming from heterogeneous domains is getting more and more common in real world applications. Such data represent objects of a certain type, connected to other types of data, the features, so that the overall data schema forms a star structure of inter-relationships. Co-clustering these data involves the specification of many parameters, such as the number of clusters for the object dimension and for all the features domains. In this paper we present a novel co-clustering algorithm for heterogeneous star-structured data that is parameter-less. This means that it does not require either the number of row clusters or the number of column clusters for the given feature spaces. Our approach optimizes the Goodman-Kruskal's tau, a measure for cross-association in contingency tables that evaluates the strength of the relationship between two categorical variables. We extend tau to evaluate co-clustering solutions and in particular we apply it in a higher dimensional setting. We propose the algorithm CoStar which optimizes tau by a local search approach. We assess the performance of CoStar on publicly available datasets from the textual and image domains using objective external criteria. The results show that our approach outperforms state-of-the-art methods for the co-clustering of heterogeneous data, while it remains computationally efficient.
引用
收藏
页码:217 / 254
页数:38
相关论文
共 50 条
  • [31] HetFCM: functional co-module discovery by heterogeneous network co-clustering
    Tan, Haojiang
    Guo, Maozu
    Chen, Jian
    Wang, Jun
    Yu, Guoxian
    NUCLEIC ACIDS RESEARCH, 2024, 52 (03) : E16
  • [32] Model-based co-clustering for ordinal data
    Jacques, Julien
    Biernacki, Christophe
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2018, 123 : 101 - 115
  • [33] Model-based co-clustering for functional data
    Ben Slimen, Yosra
    Allio, Sylvain
    Jacques, Julien
    NEUROCOMPUTING, 2018, 291 : 97 - 108
  • [34] Bipartite isoperimetric graph partitioning for data co-clustering
    Rege, Manjeet
    Dong, Ming
    Fotouhi, Farshad
    DATA MINING AND KNOWLEDGE DISCOVERY, 2008, 16 (03) : 276 - 312
  • [35] CFOND: Consensus Factorization for Co-Clustering Networked Data
    Guo, Ting
    Pan, Shirui
    Zhu, Xingquan
    Zhang, Chengqi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (04) : 706 - 719
  • [36] A New Framework for Co-clustering of Gene Expression Data
    Zhang, Shuzhong
    Wang, Kun
    Chen, Bilian
    Huang, Xiuzhen
    PATTERN RECOGNITION IN BIOINFORMATICS, 2011, 7036 : 1 - +
  • [37] Bipartite isoperimetric graph partitioning for data co-clustering
    Manjeet Rege
    Ming Dong
    Farshad Fotouhi
    Data Mining and Knowledge Discovery, 2008, 16 : 276 - 312
  • [38] Subspace Weighting Co-Clustering of Gene Expression Data
    Chen, Xiaojun
    Huang, Joshua Z.
    Wu, Qingyao
    Yang, Min
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2019, 16 (02) : 352 - 364
  • [39] Semi-supervised Co-Clustering on Attributed Heterogeneous Information Networks
    Ji, Yugang
    Shi, Chuan
    Fang, Yuan
    Kong, Xiangnan
    Yin, Mingyang
    INFORMATION PROCESSING & MANAGEMENT, 2020, 57 (06)
  • [40] Multitask possibilistic and fuzzy co-clustering algorithm for clustering data with multisource features
    Jiaqi Ren
    Youlong Yang
    Neural Computing and Applications, 2020, 32 : 4785 - 4804