A three-stage unsupervised dimension reduction method for text clustering

被引:32
|
作者
Bharti, Kusum Kumari [1 ]
Singh, P. K. [1 ]
机构
[1] ABV Indian Inst Informat Technol & Management Gwa, Computat Intelligence & Data Min Res Lab, Gwalior, MP, India
关键词
Feature selection; Feature extraction; Dimension reduction; Sparsity; Three-stage model; Text clustering; FEATURE-SELECTION; MUTUAL INFORMATION; ALGORITHM;
D O I
10.1016/j.jocs.2013.11.007
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Dimension reduction is a well-known pre-processing step in the text clustering to remove irrelevant, redundant and noisy features without sacrificing performance of the underlying algorithm. Dimension reduction methods are primarily classified as feature selection (FS) methods and feature extraction (FE) methods. Though FS methods are robust against irrelevant features, they occasionally fail to retain important information present in the original feature space. On the other hand, though FE methods reduce dimensions in the feature space without losing much information, they are significantly affected by the irrelevant features. The one-stage models, FS/FE methods, and the two-stage models, a combination of FS and FE methods proposed in the literature are not sufficient to fulfil all the above mentioned requirements of the dimension reduction. Therefore, we propose three-stage dimension reduction models to remove irrelevant, redundant and noisy features in the original feature space without loss of much valuable information. These models incorporates advantages of the FS and the FE methods to create a low dimension feature subspace. The experiments over three well-known benchmark text datasets of different characteristics show that the proposed three-stage models significantly improve performance of the clustering algorithm as measured by micro F-score, macro F-score, and total execution time. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:156 / 169
页数:14
相关论文
共 50 条
  • [31] Inference for probabilistic unsupervised text clustering
    Rigouste, Lois
    Cappe, Olivier
    Yvon, Francois
    2005 IEEE/SP 13th Workshop on Statistical Signal Processing (SSP), Vols 1 and 2, 2005, : 351 - 356
  • [32] Rainforest: A three-stage distribution adaptation framework for unsupervised time series domain adaptation
    Zhong, Yingyi
    Zhou, Wen'an
    NEUROCOMPUTING, 2024, 609
  • [33] Three-Stage Clustering Procedure for Deriving the Typical Load Curves of the Electricity Consumers
    Panapakidis, Ioannis P.
    Alexiadis, Minas C.
    Papagiannis, Grigoris K.
    2013 IEEE GRENOBLE POWERTECH (POWERTECH), 2013,
  • [34] A three-stage hybrid clustering system for diagnosing children with primary headache disorder
    Simic, Svetlana
    Sakac, Sladana
    Bankovic, Zorana
    Villar, Jose R.
    Luis Calvo-Rolle, Jose
    Simic, Svetislav D.
    Simic, Dragan
    LOGIC JOURNAL OF THE IGPL, 2023, 31 (02) : 300 - 313
  • [35] A three-stage strategy for optimal price offering by a retailer based on clustering techniques
    Mahmoudi-Kohan, N.
    Moghaddam, M. Parsa
    Sheikh-El-Eslami, M. K.
    Shayesteh, E.
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2010, 32 (10) : 1135 - 1142
  • [36] A clustering algorithm for stream data with LDA-based unsupervised localized dimension reduction
    Laohakiat, Sirisup
    Phimoltares, Suphakant
    Lursinsap, Chidchanok
    INFORMATION SCIENCES, 2017, 381 : 104 - 123
  • [37] A Three-stage Optimization Method for Dynamic Optimal Power Flow
    Bai, Yang
    Zhong, Haiwang
    Xia, Qing
    Yang, Zhifang
    2014 INTERNATIONAL CONFERENCE ON POWER SYSTEM TECHNOLOGY (POWERCON), 2014,
  • [38] Three-stage decision method for production scheduling under uncertainty
    School of Control Science and Engineering, Shandong University, Jinan 250061, China
    Kong Zhi Li Lun Yu Ying Yong, 2008, 6 (1158-1162+1166): : 1158 - 1162
  • [39] A Three-Stage Optimization Method for Assembly Line Balancing Problem
    Yin, Qidong
    Luo, Xiaochuan
    IEEE ACCESS, 2020, 8 : 143607 - 143621
  • [40] Three-Stage Method for Selecting Informative Genes for Cancer Classification
    Mohamad, Mohd Saberi
    Omatu, Sigeru
    Deris, Safaai
    Yoshioka, Michifumi
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2009, 4 (06) : 725 - 730