Clustering with Partition Level Side Information

被引:27
|
作者
Liu, Hongfu [1 ]
Fu, Yun [1 ,2 ]
机构
[1] Northeastern Univ, Dept Elect & Comp Engn, Boston, MA 02115 USA
[2] Northeastern Univ, Coll Comp & Informat Sci, Boston, MA 02115 USA
关键词
Clustering; Partition level side information; K-means; Utility function; ALGORITHMS;
D O I
10.1109/ICDM.2015.18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Constrained clustering uses pre-given knowledge to improve the clustering performance. Among existing literature, researchers usually focus on Must-Link and Cannot-Link pairwise constraints. However, pairwise constraints not only disobey the way we make decisions, but also suffer from the vulnerability of noisy constraints and the order of constraints. In light of this, we use partition level side information instead of pairwise constraints to guide the process of clustering. Compared with pairwise constraints, partition level side information keeps the consistency within partial structure and avoids self-contradictory and the impact of constraints order. Generally speaking, only small part of the data instances are given labels by human workers, which are used to supervise the procedure of clustering. Inspired by the success of ensemble clustering, we aim to find a clustering solution which captures the intrinsic structure from the data itself, and agrees with the partition level side information as much as possible. Then we derive the objective function and equivalently transfer it into a K-meanlike optimization problem. Extensive experiments on several real-world datasets demonstrate the effectiveness and efficiency of our method compared to pairwise constrained clustering and consensus clustering, which verifies the superiority of partition level side information to pairwise constraints. Besides, our method has high robustness to noisy side information.
引用
收藏
页码:877 / 882
页数:6
相关论文
共 50 条
  • [21] Ranked Adjusted Rand:: integrating distance and partition information in a measure of clustering agreement
    Pinto, Francisco R.
    Carrico, Joao A.
    Ramirez, Mario
    Almeida, Jonas S.
    BMC BIOINFORMATICS, 2007, 8 (1)
  • [22] Ranked Adjusted Rand: integrating distance and partition information in a measure of clustering agreement
    Francisco R Pinto
    João A Carriço
    Mário Ramirez
    Jonas S Almeida
    BMC Bioinformatics, 8
  • [23] Partition with side effects
    Pascual, Fanny
    Rzadca, Krzysztof
    2015 IEEE 22ND INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING (HIPC), 2015, : 295 - 304
  • [24] An Unsupervised Face Clustering Model by Self-Enhanced Side Information
    Serbes, Ahmet
    Karaduman, Bilal
    Durak-Ata, Lutfiye
    TSP 2010: 33RD INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING, 2010, : 176 - 179
  • [25] Multi-armed bandit problem with online clustering as side information
    Dzhoha, Andrii
    Rozora, Iryna
    JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS, 2023, 427
  • [26] Data clustering using side information dependent Chinese restaurant processes
    Cheng Li
    Santu Rana
    Dinh Phung
    Svetha Venkatesh
    Knowledge and Information Systems, 2016, 47 : 463 - 488
  • [27] Improving fuzzy clustering of biological data by metric learning with side information
    Ceccarelli, Michele
    Maratea, Antonio
    INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2008, 47 (01) : 45 - 57
  • [28] FUZZY SIDE INFORMATION CLUSTERING-BASED FRAMEWORK FOR EFFECTIVE RECOMMENDATIONS
    Wasid, Mohammed
    Ali, Rashid
    COMPUTING AND INFORMATICS, 2019, 38 (03) : 597 - 620
  • [29] Data clustering using side information dependent Chinese restaurant processes
    Li, Cheng
    Rana, Santu
    Phung, Dinh
    Venkatesh, Svetha
    KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 47 (02) : 463 - 488
  • [30] PARTITION OF INFORMATION
    WILLIAMS, WT
    AUSTRALIAN JOURNAL OF BOTANY, 1972, 20 (02) : 235 - &