Cluster-based sparse topical coding for topic mining and document clustering

被引:9
|
作者
Ahmadi, Parvin [1 ]
Gholampour, Iman [2 ]
Tabandeh, Mahmoud [1 ]
机构
[1] Sharif Univ Technol, Dept Elect Engn, Tehran, Iran
[2] Sharif Univ Technol, Elect Res Inst, Tehran, Iran
关键词
Document clustering; Topic model; Sparse topical coding; K-means;
D O I
10.1007/s11634-017-0280-3
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we introduce a document clustering method based on Sparse Topical Coding, called Cluster-based Sparse Topical Coding. Topic modeling is capable of improving textual document clustering by describing documents via bag-of-words models and projecting them into a topic space. The latent semantic descriptions derived by the topic model can be utilized as features in a clustering process. In our proposed method, document clustering and topic modeling are integrated in a unified framework in order to achieve the highest performance. This framework includes Sparse Topical Coding, which is responsible for topic mining, and K-means that discovers the latent clusters in documents collection. Experimental results on widely-used datasets show that our proposed method significantly outperforms the traditional and other topic model based clustering methods. Our method achieves from 4 to 39% improvement in clustering accuracy and from 2% to more than 44% improvement in normalized mutual information.
引用
收藏
页码:537 / 558
页数:22
相关论文
共 50 条
  • [21] Cluster-based information retrieval using pattern mining
    Youcef Djenouri
    Asma Belhadi
    Djamel Djenouri
    Jerry Chun-Wei Lin
    Applied Intelligence, 2021, 51 : 1888 - 1903
  • [22] Cluster-based information retrieval using pattern mining
    Djenouri, Youcef
    Belhadi, Asma
    Djenouri, Djamel
    Lin, Jerry Chun-Wei
    APPLIED INTELLIGENCE, 2021, 51 (04) : 1888 - 1903
  • [23] Spectral-Spatial Feature Learning Using Cluster-Based Group Sparse Coding for Hyperspectral Image Classification
    Zhang, Xiangrong
    Song, Qiang
    Gao, Zeyu
    Zheng, Yaoguo
    Weng, Peng
    Jiao, L. C.
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2016, 9 (09) : 4142 - 4159
  • [24] Multiple Event Detection and Recognition for Large-Scale Power Systems Through Cluster-Based Sparse Coding
    Song, Yang
    Wang, Wei
    Zhang, Zhifei
    Qi, Hairong
    Liu, Yilu
    IEEE TRANSACTIONS ON POWER SYSTEMS, 2017, 32 (06) : 4199 - 4210
  • [25] A Text Document Clustering Method Based on Topical Concept
    Ding, Yi
    Fu, Xian
    ADVANCES IN ELECTRONIC COMMERCE, WEB APPLICATION AND COMMUNICATION, VOL 1, 2012, 148 : 547 - 552
  • [26] Document clustering based on nonnegative sparse matrix factorization
    Yang, CF
    Ye, M
    Zhao, J
    ADVANCES IN NATURAL COMPUTATION, PT 2, PROCEEDINGS, 2005, 3611 : 557 - 563
  • [27] Interactive Cluster-Based Personalized Retrieval on Large Document Collections
    Belsis, Petros
    Konstantopoulos, Charalampos
    Mamalis, Basilis
    Pantzioul, Grarnmati
    Skourlas, Christos
    NEW DIRECTIONS IN INTELLIGENT INTERACTIVE MULTIMEDIA, 2008, 142 : 211 - +
  • [28] Multihop Cluster-based Architecture for Sparse Wireless Sensor Networks
    Cano, C.
    Bellalta, B.
    Villalonga, R.
    Perello, J.
    2008 EUROPEAN WIRELESS CONFERENCE, 2008, : 296 - +
  • [29] An approach for document retrieval using cluster-based inverted indexing
    Chandwani, Gunjan
    Ahlawat, Anil
    Dubey, Gaurav
    JOURNAL OF INFORMATION SCIENCE, 2023, 49 (03) : 726 - 739
  • [30] Hybrid Indexing for Versioned Document Search with Cluster-based Retrieval
    Jin, Xin
    Agun, Daniel
    Yang, Tao
    Wu, Qinghao
    Shen, Yifan
    Zhao, Susen
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 377 - 386