Feature Selection based on Supervised Topic Modeling for Boosting-Based Multi-Label Text Categorization

被引:0
|
作者
Al-Salemi, Bassam [1 ]
Ayob, Masri [1 ]
Noah, Shahrul Azman Mohd [1 ]
Ab Aziz, Mohd Juzaiddin [1 ]
机构
[1] Univ Kebangsaan Malaysia, Fac Informat Sci & Technol, Bangi, Malaysia
关键词
AdaBoost.MH; feature selection; text categorization; supervised topic modeling; Latent Dirichlet Allocation; ALGORITHM;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The text representation model Bag-Of-Words is a simple and typical model which uses the single words as elements to represent the texts in the feature space. However, using the single words as features will produce a high dimensional feature space, which result in the learning computational cost, particularly for ensemble learning algorithms, such as the boosting algorithm AdaBoost.MH. The straightforward solution of this matter can be managed by using a feature selection method capable of reducing the features space effectively. This work describes how to utilize the supervised topic model Labeled Latent Dirichlet Allocation for feature selection, as well accelerating AdaBoost.MH learning for multi-label text categorization. The experimental results on three benchmarks demonstrated that using Labeled Latent Dirichlet Allocation for feature selection improves and accelerates AdaBoost.MH and exceeds the performance of three existing methods.
引用
收藏
页数:6
相关论文
共 50 条
  • [21] Multi-label dataless text classification with topic modeling
    Zha, Daochen
    Li, Chenliang
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 137 - 160
  • [22] Multi-label feature selection based on correlation label enhancement
    He, Zhuoxin
    Lin, Yaojin
    Wang, Chenxi
    Guo, Lei
    Ding, Weiping
    INFORMATION SCIENCES, 2023, 647
  • [23] Multi-label feature selection based on the division of label topics
    Zhang, Ping
    Gao, Wanfu
    Hu, Juncheng
    Li, Yonghao
    INFORMATION SCIENCES, 2021, 553 : 129 - 153
  • [24] Sparse semi-supervised multi-label feature selection based on latent representation
    Zhao, Xue
    Li, Qiaoyan
    Xing, Zhiwei
    Yang, Xiaofei
    Dai, Xuezhen
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (04) : 5139 - 5151
  • [25] ReliefF-based multi-label feature selection
    2015, Science and Engineering Research Support Society (08):
  • [26] Alignment Based Feature Selection for Multi-label Learning
    Chen, Linlin
    Chen, Degang
    NEURAL PROCESSING LETTERS, 2019, 50 (03) : 2323 - 2344
  • [27] Multi-label feature selection based on nonlinear mapping
    Wang, Yan
    Wang, Changzhong
    Deng, Tingquan
    Li, Wenqi
    INFORMATION SCIENCES, 2024, 680
  • [28] Alignment Based Feature Selection for Multi-label Learning
    Linlin Chen
    Degang Chen
    Neural Processing Letters, 2019, 50 : 2323 - 2344
  • [29] BoostFS: A Boosting-Based Irrelevant Feature Selection Algorithm
    Miao, Qi-Guang
    Cao, Ying
    Song, Jian-Feng
    Liu, Jiachen
    Quan, Yining
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2015, 29 (07)
  • [30] TREEBOOST.MH: A boosting algorithm for multi-label hierarchical text categorization
    Esuli, Andrea
    Fagni, Tiziano
    Sebastiani, Fabrizio
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2006, 4209 : 13 - 24