Textual data summarization using the Self-Organized Co-Clustering model

被引:12
|
作者
Selosse, Margot [1 ]
Jacques, Julien [1 ]
Biernacki, Christophe [2 ,3 ]
机构
[1] Univ Lyon, Lyon & ERIC EA3083 2, 5 Ave Pierre Mendes, Bron 69500, France
[2] Univ Lille, UFR Math, Cite Sci, Villeneuve Dascq 59655, France
[3] INRIA, 40 Av Halley,Bat A,Pk Plaza, Villeneuve Dascq 59650, France
关键词
Co-Clustering; Document-term matrix; Latent block model; LATENT BLOCK MODEL; FACTORIZATION; MATRIX;
D O I
10.1016/j.patcog.2020.107315
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, different studies have demonstrated the use of co-clustering, a data mining technique which simultaneously produces row-clusters of observations and column-clusters of features. The present work introduces a novel co-clustering model to easily summarize textual data in a document-term format. In addition to highlighting homogeneous co-clusters as other existing algorithms do we also distinguish noisy co-clusters from significant co-clusters, which is particularly useful for sparse document-term matrices. Furthermore, our model proposes a structure among the significant co-clusters, thus providing improved interpretability to users. The approach proposed contends with state-of-the-art methods for document and term clustering and offers user-friendly results. The model relies on the Poisson distribution and on a constrained version of the Latent Block Model, which is a probabilistic approach for co-clustering. A Stochastic Expectation-Maximization algorithm is proposed to run the model's inference as well as a model selection criterion to choose the number of co-clusters. Both simulated and real data sets illustrate the efficiency of this model by its ability to easily identify relevant co-clusters. (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Dynamic network for identification of character locations using self-organized clustering
    Kondo, M
    Miyanaga, Y
    Tochinai, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 1996, 79 (07): : 11 - 21
  • [22] Co-clustering of fuzzy lagged data
    Eran Shaham
    David Sarne
    Boaz Ben-Moshe
    Knowledge and Information Systems, 2015, 44 : 217 - 252
  • [23] Co-clustering of fuzzy lagged data
    Shaham, Eran
    Sarne, David
    Ben-Moshe, Boaz
    KNOWLEDGE AND INFORMATION SYSTEMS, 2015, 44 (01) : 217 - 252
  • [24] Self-Organized Clustering Approach for Motion Discrimination using EMG Signal
    Kita, Kahori
    Kato, Ryu
    Yokoi, Hiroshi
    2009 ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY, VOLS 1-20, 2009, : 2587 - 2590
  • [25] Gaussian Topographic Co-clustering Model
    Priam, Rodolphe
    Nadif, Mohamed
    Govaert, Gerard
    ADVANCES IN INTELLIGENT DATA ANALYSIS XII, 2013, 8207 : 345 - 356
  • [26] Co-clustering contaminated data: a robust model-based approach
    Fibbi, Edoardo
    Perrotta, Domenico
    Torti, Francesca
    Van Aelst, Stefan
    Verdonck, Tim
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2024, 18 (01) : 121 - 161
  • [27] Co-clustering contaminated data: a robust model-based approach
    Edoardo Fibbi
    Domenico Perrotta
    Francesca Torti
    Stefan Van Aelst
    Tim Verdonck
    Advances in Data Analysis and Classification, 2024, 18 : 121 - 161
  • [28] Scene modeling using co-clustering
    Liu, Jingen
    Shah, Mubarak
    2007 IEEE 11TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, VOLS 1-6, 2007, : 298 - 304
  • [29] Model-based co-clustering for the effective handling of sparse data
    Ailem, Melissa
    Role, Francois
    Nadif, Mohamed
    PATTERN RECOGNITION, 2017, 72 : 108 - 122
  • [30] Model-based Co-clustering for High Dimensional Sparse Data
    Salah, Aghiles
    Rogovschi, Nicoleta
    Nadif, Mohamed
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 866 - 874