Constrained Co-Clustering for Textual Documents

被引:0
|
作者
Song, Yangqiu [1 ]
Pan, Shimei [2 ]
Liu, Shixia [1 ]
Wei, Furu [1 ]
Zhou, Michelle X. [3 ]
Qian, Weihong [1 ]
机构
[1] IBM Res China, Beijing, Peoples R China
[2] IBM Res TJ Watson Ctr, Hawthorne, NY USA
[3] IBM Res Almaden Ctr, San Jose, CA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a constrained co-clustering approach for clustering textual documents. Our approach combines the benefits of information-theoretic co-clustering and constrained clustering. We use a two-sided hidden Markov random field (HMRF) to model both the document and word constraints. We also develop an alternating expectation maximization (EM) algorithm to optimize the constrained co-clustering model. We have conducted two sets of experiments on a benchmark data set: (1) using human-provided category labels to derive document and word constraints for semi-supervised document clustering, and (2) using automatically extracted named entities to derive document constraints for unsupervised document clustering. Compared to several representative constrained clustering and co-clustering approaches, our approach is shown to be more effective for high-dimensional, sparse text data.
引用
收藏
页码:581 / 586
页数:6
相关论文
共 50 条
  • [42] SHCoClust, a Scalable Similarity-based Hierarchical Co-clustering Method and its Application to Textual Collections
    Wang, Xinyu
    Ah-Pine, Julien
    Darmont, Jerome
    2017 IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS (FUZZ-IEEE), 2017,
  • [43] Co-Clustering Under the Maximum Norm
    Bulteau, Laurent
    Froese, Vincent
    Hartung, Sepp
    Niedermeier, Rolf
    ALGORITHMS AND COMPUTATION, ISAAC 2014, 2014, 8889 : 298 - 309
  • [44] Sleeved co-clustering of lagged data
    Shaham, Eran
    Sarne, David
    Ben-Moshe, Boaz
    KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 31 (02) : 251 - 279
  • [45] Gaussian Topographic Co-clustering Model
    Priam, Rodolphe
    Nadif, Mohamed
    Govaert, Gerard
    ADVANCES IN INTELLIGENT DATA ANALYSIS XII, 2013, 8207 : 345 - 356
  • [46] Co-clustering from Tensor Data
    Boutalbi, Rafika
    Labiod, Lazhar
    Nadif, Mohamed
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT I, 2019, 11439 : 370 - 383
  • [47] Co-Clustering under the Maximum Norm
    Bulteau, Laurent
    Froese, Vincent
    Hartung, Sepp
    Niedermeier, Rolf
    ALGORITHMS, 2016, 9 (01)
  • [48] Co-clustering for auditory scene categorization
    Cai, Rui
    Lu, Lie
    Hanjalic, Alan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (04) : 596 - 606
  • [49] Multiobjective Optimization of Co-Clustering Ensembles
    Gullo, Francesco
    Talukder, Akm Khaled Ahsan
    Luke, Sean
    Domeniconi, Carlotta
    Tagarelli, Andrea
    PROCEEDINGS OF THE FOURTEENTH INTERNATIONAL CONFERENCE ON GENETIC AND EVOLUTIONARY COMPUTATION COMPANION (GECCO'12), 2012, : 1495 - 1496
  • [50] CO-CLUSTERING FOR QUERIES AND CORRESPONDING ADVERTISEMENT
    Yang, Fan
    An, Bin
    Wang, Xizhao
    PROCEEDINGS OF 2009 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-6, 2009, : 2296 - +