Cluster-based sparse topical coding for topic mining and document clustering

被引:9
|
作者
Ahmadi, Parvin [1 ]
Gholampour, Iman [2 ]
Tabandeh, Mahmoud [1 ]
机构
[1] Sharif Univ Technol, Dept Elect Engn, Tehran, Iran
[2] Sharif Univ Technol, Elect Res Inst, Tehran, Iran
关键词
Document clustering; Topic model; Sparse topical coding; K-means;
D O I
10.1007/s11634-017-0280-3
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we introduce a document clustering method based on Sparse Topical Coding, called Cluster-based Sparse Topical Coding. Topic modeling is capable of improving textual document clustering by describing documents via bag-of-words models and projecting them into a topic space. The latent semantic descriptions derived by the topic model can be utilized as features in a clustering process. In our proposed method, document clustering and topic modeling are integrated in a unified framework in order to achieve the highest performance. This framework includes Sparse Topical Coding, which is responsible for topic mining, and K-means that discovers the latent clusters in documents collection. Experimental results on widely-used datasets show that our proposed method significantly outperforms the traditional and other topic model based clustering methods. Our method achieves from 4 to 39% improvement in clustering accuracy and from 2% to more than 44% improvement in normalized mutual information.
引用
收藏
页码:537 / 558
页数:22
相关论文
共 50 条
  • [31] A Novel Graph Based Clustering Approach to Document Topic Modeling
    Chanda, Prateek
    Das, Asit Kumar
    2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [32] Novel Similarity Measure for Document Clustering Based on Topic Phrases
    ELdesoky, A. E.
    Saleh, M.
    Sakr, N. A.
    ICNM: 2009 INTERNATIONAL CONFERENCE ON NETWORKING & MEDIA CONVERGENCE, 2007, : 92 - +
  • [33] Topic mining based on word posterior probability in spoken document
    Zhang L.
    Chen G.-X.
    Xiang X.-Z.
    Chang J.-X.
    Journal of Software, 2011, 6 (11 SPEC. ISSUE) : 2292 - 2299
  • [34] Cluster-Based News Representative Generation with Automatic Incremental Clustering
    Shabirin, Irsal
    Barakbah, Ali Ridho
    Syarif, Iwan
    EMITTER-INTERNATIONAL JOURNAL OF ENGINEERING TECHNOLOGY, 2019, 7 (02) : 467 - 479
  • [35] Cluster-based Cooperative Communication with Network Coding in Wireless Networks
    Haas, Zygmunt J.
    Chen, Tuan-Che
    MILITARY COMMUNICATIONS CONFERENCE, 2010 (MILCOM 2010), 2010, : 2082 - 2089
  • [36] Cluster-based evaluation in fuzzy-genetic data mining
    Chen, Chun-Hao
    Tseng, Vincent S.
    Hong, Tzung-Pei
    IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2008, 16 (01) : 249 - 262
  • [37] A cluster-based method for mining generalized fuzzy association rules
    Chiu, Hung-Pin
    Tang, Yi-Tsung
    Hsieh, Kun-Lin
    ICICIC 2006: FIRST INTERNATIONAL CONFERENCE ON INNOVATIVE COMPUTING, INFORMATION AND CONTROL, VOL 2, PROCEEDINGS, 2006, : 519 - +
  • [38] Cluster-based mining of microarray data in PHP/MYSQL environment
    Udoh, E.
    Bhuiyan, S.
    ADVANCES IN SYSTEMS, COMPUTING SCIENCES AND SOFTWARE ENGINEERING, 2006, : 197 - +
  • [39] The Research of Document Clustering Topical Concept Based on Neural Networks
    Fu, Xian
    Ding, Yi
    ADVANCES IN NEURAL NETWORKS - ISNN 2014, 2014, 8866 : 621 - 628
  • [40] An Intention-Topic Model Based on Verbs Clustering and Short Texts Topic Mining
    Lu, Tingting
    Hou, Shifeng
    Chen, Zhenxiang
    Cui, Lizhen
    Zhang, Lei
    CIT/IUCC/DASC/PICOM 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY - UBIQUITOUS COMPUTING AND COMMUNICATIONS - DEPENDABLE, AUTONOMIC AND SECURE COMPUTING - PERVASIVE INTELLIGENCE AND COMPUTING, 2015, : 837 - 842