Feature Selection based on Supervised Topic Modeling for Boosting-Based Multi-Label Text Categorization

被引:0
|
作者
Al-Salemi, Bassam [1 ]
Ayob, Masri [1 ]
Noah, Shahrul Azman Mohd [1 ]
Ab Aziz, Mohd Juzaiddin [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Bangi, Malaysia
关键词
AdaBoost.MH; feature selection; text categorization; supervised topic modeling; Latent Dirichlet Allocation; ALGORITHM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The text representation model Bag-Of-Words is a simple and typical model which uses the single words as elements to represent the texts in the feature space. However, using the single words as features will produce a high dimensional feature space, which result in the learning computational cost, particularly for ensemble learning algorithms, such as the boosting algorithm AdaBoost.MH. The straightforward solution of this matter can be managed by using a feature selection method capable of reducing the features space effectively. This work describes how to utilize the supervised topic model Labeled Latent Dirichlet Allocation for feature selection, as well accelerating AdaBoost.MH learning for multi-label text categorization. The experimental results on three benchmarks demonstrated that using Labeled Latent Dirichlet Allocation for feature selection improves and accelerates AdaBoost.MH and exceeds the performance of three existing methods.
引用
收藏
页数:6
相关论文
共 50 条
  • [11] BoosTexter: A Boosting-based System for Text Categorization
    Robert E. Schapire
    Yoram Singer
    Machine Learning, 2000, 39 : 135 - 168
  • [12] Selection strategies for multi-label text categorization
    Montejo-Raez, Arturo
    Urena-Lopez, Luis Alfonso
    ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4139 : 585 - 592
  • [13] Weakly supervised multi-label feature selection based on shared subspace
    Shi, Rongyi
    Tan, Anhui
    Shi, Suwei
    Wang, Jin
    Gu, Shenming
    Wu, Weizhi
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024,
  • [14] Deep label relevance and label ambiguity based multi-label feature selection for text classification
    Verma, Gurudatta
    Sahu, Tirath Prasad
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2025, 148
  • [15] Minimum Classification Error Rate Training of Supervised Topic Mixture Model for Multi-label Text Categorization
    He, Zhiyang
    Lv, Ping
    Wu, Ji
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 39 - +
  • [16] A lightweight filter based feature selection approach for multi-label text classification
    Dhal P.
    Azad C.
    Journal of Ambient Intelligence and Humanized Computing, 2023, 14 (09) : 12345 - 12357
  • [17] A COPRAS-based Approach to Multi-Label Feature Selection for Text Classification
    Mohanrasu, S. S.
    Janani, K.
    Rakkiyappan, R.
    MATHEMATICS AND COMPUTERS IN SIMULATION, 2024, 222 : 3 - 23
  • [18] Multi-label feature selection based on label correlations and feature redundancy
    Fan, Yuling
    Chen, Baihua
    Huang, Weiqin
    Liu, Jinghua
    Weng, Wei
    Lan, Weiyao
    KNOWLEDGE-BASED SYSTEMS, 2022, 241
  • [19] Multi-label feature selection based on label distribution and feature complementarity
    Qian, Wenbin
    Long, Xuandong
    Wang, Yinglong
    Xie, Yonghong
    APPLIED SOFT COMPUTING, 2020, 90
  • [20] Multi-label dataless text classification with topic modeling
    Daochen Zha
    Chenliang Li
    Knowledge and Information Systems, 2019, 61 : 137 - 160