Parameter-less co-clustering for star-structured heterogeneous data

被引:39
|
作者
Ienco, Dino [1 ,3 ]
Robardet, Celine [2 ]
Pensa, Ruggero G. [1 ]
Meo, Rosa [1 ]
机构
[1] Univ Turin, Dept Comp Sci, I-10139 Turin, Italy
[2] Univ Lyon, CNRS, INSA Lyon, LIRIS UMR5205, F-69621 Villeurbanne, France
[3] IRSTEA Montpellier, UMR TETIS, F-34093 Montpellier, France
关键词
Co-clustering; Star-structured data; Multi-view data; LOCAL SEARCH; ALGORITHMS;
D O I
10.1007/s10618-012-0248-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The availability of data represented with multiple features coming from heterogeneous domains is getting more and more common in real world applications. Such data represent objects of a certain type, connected to other types of data, the features, so that the overall data schema forms a star structure of inter-relationships. Co-clustering these data involves the specification of many parameters, such as the number of clusters for the object dimension and for all the features domains. In this paper we present a novel co-clustering algorithm for heterogeneous star-structured data that is parameter-less. This means that it does not require either the number of row clusters or the number of column clusters for the given feature spaces. Our approach optimizes the Goodman-Kruskal's tau, a measure for cross-association in contingency tables that evaluates the strength of the relationship between two categorical variables. We extend tau to evaluate co-clustering solutions and in particular we apply it in a higher dimensional setting. We propose the algorithm CoStar which optimizes tau by a local search approach. We assess the performance of CoStar on publicly available datasets from the textual and image domains using objective external criteria. The results show that our approach outperforms state-of-the-art methods for the co-clustering of heterogeneous data, while it remains computationally efficient.
引用
收藏
页码:217 / 254
页数:38
相关论文
共 50 条
  • [1] Parameter-less co-clustering for star-structured heterogeneous data
    Dino Ienco
    Céline Robardet
    Ruggero G. Pensa
    Rosa Meo
    Data Mining and Knowledge Discovery, 2013, 26 : 217 - 254
  • [2] Parameter-Less Tensor Co-clustering
    Battaglia, Elena
    Pensa, Ruggero G.
    DISCOVERY SCIENCE (DS 2019), 2019, 11828 : 205 - 219
  • [3] A parameter-less algorithm for tensor co-clustering
    Elena Battaglia
    Ruggero G. Pensa
    Machine Learning, 2023, 112 : 385 - 427
  • [4] A parameter-less algorithm for tensor co-clustering
    Battaglia, Elena
    Pensa, Ruggero G.
    MACHINE LEARNING, 2023, 112 (02) : 385 - 427
  • [5] Star-structured high-order heterogeneous data co-clustering based on consistent information theory
    Gao, Bin
    Liu, Tie-Yan
    Ma, Wei-Ying
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 880 - +
  • [6] Clustering multi-typed objects in extended star-structured heterogeneous data
    Huang, Yue
    INTELLIGENT DATA ANALYSIS, 2017, 21 (02) : 225 - 241
  • [7] A novel parameter-less clustering method for mining gene expression data
    Tseng, VSM
    Kao, CP
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2004, 3056 : 692 - 698
  • [8] Fuzzy Clustering Approach for Star-Structured Multi-Type Relational Data
    Mei, Jian-Ping
    Chen, Lihui
    IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ 2011), 2011, : 2500 - 2506
  • [9] Heterogeneous Sparse Relational Data Co-Clustering in Social network
    Shen, Guowei
    Wang, Wei
    Yang, Wu
    Yu, Miao
    Dong, Guozhong
    IEEE 12TH INT CONF UBIQUITOUS INTELLIGENCE & COMP/IEEE 12TH INT CONF ADV & TRUSTED COMP/IEEE 15TH INT CONF SCALABLE COMP & COMMUN/IEEE INT CONF CLOUD & BIG DATA COMP/IEEE INT CONF INTERNET PEOPLE AND ASSOCIATED SYMPOSIA/WORKSHOPS, 2015, : 77 - 84
  • [10] Joint co-clustering: Co-clustering of genomic and clinical bioimaging data
    Ficarra, Elisa
    De Micheli, Giovanni
    Yoon, Sungroh
    Benini, Luca
    Macii, Enrico
    COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2008, 55 (05) : 938 - 949