Parameter-less co-clustering for star-structured heterogeneous data

被引:39
|
作者
Ienco, Dino [1 ,3 ]
Robardet, Celine [2 ]
Pensa, Ruggero G. [1 ]
Meo, Rosa [1 ]
机构
[1] Univ Turin, Dept Comp Sci, I-10139 Turin, Italy
[2] Univ Lyon, CNRS, INSA Lyon, LIRIS UMR5205, F-69621 Villeurbanne, France
[3] IRSTEA Montpellier, UMR TETIS, F-34093 Montpellier, France
关键词
Co-clustering; Star-structured data; Multi-view data; LOCAL SEARCH; ALGORITHMS;
D O I
10.1007/s10618-012-0248-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The availability of data represented with multiple features coming from heterogeneous domains is getting more and more common in real world applications. Such data represent objects of a certain type, connected to other types of data, the features, so that the overall data schema forms a star structure of inter-relationships. Co-clustering these data involves the specification of many parameters, such as the number of clusters for the object dimension and for all the features domains. In this paper we present a novel co-clustering algorithm for heterogeneous star-structured data that is parameter-less. This means that it does not require either the number of row clusters or the number of column clusters for the given feature spaces. Our approach optimizes the Goodman-Kruskal's tau, a measure for cross-association in contingency tables that evaluates the strength of the relationship between two categorical variables. We extend tau to evaluate co-clustering solutions and in particular we apply it in a higher dimensional setting. We propose the algorithm CoStar which optimizes tau by a local search approach. We assess the performance of CoStar on publicly available datasets from the textual and image domains using objective external criteria. The results show that our approach outperforms state-of-the-art methods for the co-clustering of heterogeneous data, while it remains computationally efficient.
引用
收藏
页码:217 / 254
页数:38
相关论文
共 50 条
  • [21] A co-clustering algorithm based on structured Web document
    Deng, Dong-Mei
    Long, Ji-Zhen
    Yin, Xiang-Zhou
    Zhongnan Daxue Xuebao (Ziran Kexue Ban)/Journal of Central South University (Science and Technology), 2010, 41 (05): : 1871 - 1876
  • [22] Co-clustering of fuzzy lagged data
    Eran Shaham
    David Sarne
    Boaz Ben-Moshe
    Knowledge and Information Systems, 2015, 44 : 217 - 252
  • [23] Co-clustering of fuzzy lagged data
    Shaham, Eran
    Sarne, David
    Ben-Moshe, Boaz
    KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 44 (01) : 217 - 252
  • [24] The Parameter-less Randomized Gravitational Clustering algorithm with online clusters' structure characterization
    Gomez, Jonatan
    Leon, Elizabeth
    Nasraoui, Olfa
    Giraldo, Fabian
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2014, 2 (04) : 217 - 236
  • [25] Co-clustering for Binary Data with Maximum Modularity
    Labiod, Lazhar
    Nadif, Mohamed
    NEURAL INFORMATION PROCESSING, PT II, 2011, 7063 : 700 - 708
  • [26] CO-CLUSTERING OF SPATIALLY RESOLVED TRANSCRIPTOMIC DATA
    Sottosanti, Andrea
    Risso, Davide
    ANNALS OF APPLIED STATISTICS, 2023, 17 (02): : 1444 - 1468
  • [27] CO-CLUSTERING SEPARATELY EXCHANGEABLE NETWORK DATA
    Choi, David
    Wolfe, Patrick J.
    ANNALS OF STATISTICS, 2014, 42 (01): : 29 - 63
  • [28] A fuzzy co-clustering algorithm for biomedical data
    Liu, Yongli
    Wu, Shuai
    Liu, Zhizhong
    Chao, Hao
    PLOS ONE, 2017, 12 (04):
  • [29] Adaptive Spectral Co-clustering for Multiview Data
    Son, Jeong-Woo
    Jeon, Junekey
    Lee, Sang-Yun
    Kim, Sun-Joong
    2016 18TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATIONS TECHNOLOGY (ICACT) - INFORMATION AND COMMUNICATIONS FOR SAFE AND SECURE LIFE, 2016, : 447 - 450
  • [30] Towards parameter-less and similarity-based fuzzy clustering based on PCM method
    Tseng, Vincent S.
    Kao, Ching-Pin
    2006 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS, VOLS 1-6, PROCEEDINGS, 2006, : 4106 - +