Cluster-based sparse topical coding for topic mining and document clustering

被引:9
|
作者
Ahmadi, Parvin [1 ]
Gholampour, Iman [2 ]
Tabandeh, Mahmoud [1 ]
机构
[1] Sharif Univ Technol, Dept Elect Engn, Tehran, Iran
[2] Sharif Univ Technol, Elect Res Inst, Tehran, Iran
关键词
Document clustering; Topic model; Sparse topical coding; K-means;
D O I
10.1007/s11634-017-0280-3
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we introduce a document clustering method based on Sparse Topical Coding, called Cluster-based Sparse Topical Coding. Topic modeling is capable of improving textual document clustering by describing documents via bag-of-words models and projecting them into a topic space. The latent semantic descriptions derived by the topic model can be utilized as features in a clustering process. In our proposed method, document clustering and topic modeling are integrated in a unified framework in order to achieve the highest performance. This framework includes Sparse Topical Coding, which is responsible for topic mining, and K-means that discovers the latent clusters in documents collection. Experimental results on widely-used datasets show that our proposed method significantly outperforms the traditional and other topic model based clustering methods. Our method achieves from 4 to 39% improvement in clustering accuracy and from 2% to more than 44% improvement in normalized mutual information.
引用
收藏
页码:537 / 558
页数:22
相关论文
共 50 条
  • [41] Cluster-based Partial Dense Retrieval Fused with Sparse Text Retrieval
    Yang, Yingrui
    Carlson, Parker
    He, Shanxiu
    Qiao, Yifan
    Yang, Tao
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2327 - 2331
  • [42] Dictionary Construction for Sparse Representation Classification: A Novel Cluster-based Approach
    Liu, Weiyang
    Wen, Yandong
    Li, Hui
    Zhu, Bing
    2014 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATION (ISCC), 2014,
  • [43] Trend-based Document Clustering for Sensitive and Stable Topic Detection
    Sato, Yoshihide
    Kawashima, Harumi
    Okuda, Hidenori
    Oku, Masahiro
    PACLIC 22: PROCEEDINGS OF THE 22ND PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2008, : 331 - +
  • [44] Cluster-Based Analysis of Novice Coding Misconceptions in Block-Based Programming
    Emerson, Andrew
    Smith, Andy
    Rodriguez, Fernando J.
    Wiebe, Eric N.
    Mott, Bradford W.
    Boyer, Kristy Elizabeth
    Lester, James C.
    SIGCSE 2020: PROCEEDINGS OF THE 51ST ACM TECHNICAL SYMPOSIUM ON COMPUTER SCIENCE EDUCATION, 2020, : 825 - 831
  • [45] Tensor LRR and Sparse Coding-Based Subspace Clustering
    Fu, Yifan
    Gao, Junbin
    Tien, David
    Lin, Zhouchen
    Hong, Xia
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2016, 27 (10) : 2120 - 2133
  • [46] Evaluation of FCV and FCM Clustering Algorithms in Cluster-Based Compound Selection
    Suhaili, Sinarwati Mohamad
    Jambli, Mohamad Nazim
    2011 7TH INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY IN ASIA (CITA 11), 2011,
  • [47] Using Entropy Cluster-Based Clustering for Finding Potential Protein Complexes
    Viet-Hoang Le
    Kim, Sung-Ryul
    BIOINFORMATICS AND BIOMEDICAL ENGINEERING (IWBBIO 2015), PT I, 2015, 9043 : 524 - 535
  • [48] A cluster-based Outlier detection method without pre-clustering
    Ren, DM
    Wang, BY
    Perrizo, W
    COMPUTER APPLICATIONS IN INDUSTRY AND ENGINEERING, 2004, : 177 - 180
  • [49] Mechanisms of improving cluster-based connection structure in document base of SDI users
    Ceglarek, D
    Abramowicz, W
    ADVANCED RESEARCH IN COMPUTERS AND COMMUNICATIONS IN EDUCATION, VOL 1: NEW HUMAN ABILITIES FOR THE NETWORKED SOCIETY, 1999, 55 : 1091 - 1094
  • [50] Product typicality attribute mining method based on a topic clustering ensemble
    Sun, Jing-Tao
    Zhang, Qiu-Yu
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (08) : 6629 - 6654