Cluster-based sparse topical coding for topic mining and document clustering

被引:9
|
作者
Ahmadi, Parvin [1 ]
Gholampour, Iman [2 ]
Tabandeh, Mahmoud [1 ]
机构
[1] Sharif Univ Technol, Dept Elect Engn, Tehran, Iran
[2] Sharif Univ Technol, Elect Res Inst, Tehran, Iran
关键词
Document clustering; Topic model; Sparse topical coding; K-means;
D O I
10.1007/s11634-017-0280-3
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we introduce a document clustering method based on Sparse Topical Coding, called Cluster-based Sparse Topical Coding. Topic modeling is capable of improving textual document clustering by describing documents via bag-of-words models and projecting them into a topic space. The latent semantic descriptions derived by the topic model can be utilized as features in a clustering process. In our proposed method, document clustering and topic modeling are integrated in a unified framework in order to achieve the highest performance. This framework includes Sparse Topical Coding, which is responsible for topic mining, and K-means that discovers the latent clusters in documents collection. Experimental results on widely-used datasets show that our proposed method significantly outperforms the traditional and other topic model based clustering methods. Our method achieves from 4 to 39% improvement in clustering accuracy and from 2% to more than 44% improvement in normalized mutual information.
引用
收藏
页码:537 / 558
页数:22
相关论文
共 50 条
  • [1] Cluster-based sparse topical coding for topic mining and document clustering
    Parvin Ahmadi
    Iman Gholampour
    Mahmoud Tabandeh
    Advances in Data Analysis and Classification, 2018, 12 : 537 - 558
  • [2] Cluster-based Language Model for Spoken Document Retrieval Using NMF-Based Document Clustering
    Hu, Xinhui
    Isotani, Ryosuke
    Kawai, Hisashi
    Nakamura, Satoshi
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 705 - 708
  • [3] Document clustering and cluster topic extraction in multilingual corpora
    Silva, J
    Mexia, J
    Coelho, A
    Lopes, G
    2001 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2001, : 513 - 520
  • [4] Clustering Improvement via Integrating with Sparse Topical Coding
    Ahmadi, Parvin
    Kaviani, Razie
    Gholampour, Iman
    Tabandeh, Mahmoud
    2015 23RD IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2015, : 466 - 471
  • [5] Sparse Poisson Coding for High Dimensional Document Clustering
    Wu, Chenxia
    Yang, Haiqin
    Zhu, Jianke
    Zhang, Jiemi
    King, Irwin
    Lyu, Michael R.
    2013 IEEE INTERNATIONAL CONFERENCE ON BIG DATA, 2013,
  • [6] Selective Cluster-Based Document Retrieval
    Levi, Or
    Raiber, Fiana
    Kurland, Oren
    Guy, Ido
    CIKM'16: PROCEEDINGS OF THE 2016 ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2016, : 1473 - 1482
  • [7] KS-cluster: A spectral clustering method based on kernelized sparse representation for document clustering
    Xing, Jieqing
    Wang, Chunteng
    ICIC Express Letters, 2015, 9 (10): : 2801 - 2806
  • [8] A Hybrid Method for Manufacturing Text Mining Based on Document Clustering and Topic Modeling Techniques
    Shotorbani, Peyman Yazdizadeh
    Ameri, Farhad
    Kulvatunyou, Boonserm
    Ivezic, Nenad
    ADVANCES IN PRODUCTION MANAGEMENT SYSTEMS: INITIATIVES FOR A SUSTAINABLE WORLD, 2016, 488 : 777 - 786
  • [9] Topic Mining Based on Graph Local Clustering
    Garza Villarreal, Sara Elena
    Brena, Ramon F.
    ADVANCES IN SOFT COMPUTING, PT II, 2011, 7095 : 201 - +
  • [10] Multiple Event Analysis for Large-scale Power Systems through Cluster-based Sparse Coding
    Song, Yang
    Wang, Wei
    Mang, Zhifei
    Qi, Hairong
    Liu, Yilu
    2015 IEEE INTERNATIONAL CONFERENCE ON SMART GRID COMMUNICATIONS (SMARTGRIDCOMM), 2015, : 301 - 306