Topic selection for text classification using ensemble topic modeling with grouping, scoring, and modeling approach

被引:1
|
作者
Voskergian, Daniel [1 ]
Jayousi, Rashid [2 ]
Yousef, Malik [3 ]
机构
[1] Al Quds Univ, Comp Engn Dept, Jerusalem, Palestine
[2] Al Quds Univ, Comp Sci Dept, Jerusalem, Palestine
[3] Zefat Acad Coll, Dept Informat Syst, Safed, Israel
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
Topic model; Topic selection; Feature Selection; Ensemble learning; Text classification; Machine learning;
D O I
10.1038/s41598-024-74022-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
TextNetTopics (Yousef et al. in Front Genet 13:893378, 2022. https://doi.org/10.3389/fgene.2022.893378) is a recently developed approach that performs text classification-based topics (a topic is a group of terms or words) extracted from a Latent Dirichlet Allocation topic modeling as features rather than individual words. Following this approach enables TextNetTopics to fulfill dimensionality reduction while preserving and embedding more thematic and semantic information into the text document representations. In this article, we introduced a novel approach, the Ensemble Topic Model for Topic Selection (ENTM-TS), an advancement of TextNetTopics. ENTM-TS integrates multiple topic models using the Grouping, Scoring, and Modeling approach, thereby mitigating the performance variability introduced by employing individual topic modeling methods within TextNetTopics. Additionally, we performed a thorough comparative study to evaluate TextNetTopics' performance using eleven state-of-the-art topic modeling algorithms. We used the extracted topics for each as input to the G component in the TextNetTopics tool to select the most compelling topic model regarding their predictive behavior for text classification. We conducted our comprehensive evaluation utilizing the Drug-Induced Liver Injury textual dataset from the CAMDA community and the WOS-5736 dataset. The experimental results show that the Latent Semantic Indexing provides comparable performance measures with fewer discriminative features when compared with other topic modeling methods. Moreover, our evaluation reveals that the performance of ENTM-TS surpasses or aligns with the optimal outcomes obtained from individual topic models across the two datasets, establishing it as a robust and effective enhancement in text classification tasks.
引用
收藏
页数:19
相关论文
共 50 条
  • [41] Using SVD for Topic Modeling
    Ke, Zheng Tracy
    Wang, Minzhe
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (545) : 434 - 449
  • [42] Building Vietnamese Topic Modeling Based on Core Terms and Applying in Text Classification
    Ha Nguyen Thi Thu
    Tinh Dao Thanh
    Thanh Nguyen Hai
    Vinh Ho Ngoc
    2015 FIFTH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT2015), 2015, : 1284 - 1288
  • [43] An enhanced few-shot text classification approach by integrating topic modeling and prompt-tuning
    Zhang, Yinghui
    Xu, Yichun
    Dong, Fangmin
    NEUROCOMPUTING, 2025, 617
  • [44] Ensemble topic modeling using weighted term co-associations
    Belford, Mark
    Greene, Derek
    EXPERT SYSTEMS WITH APPLICATIONS, 2020, 161
  • [45] Anchor Prediction: A Topic Modeling Approach
    Dupuy, Jean
    Guille, Adrien
    Jacques, Julien
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 1310 - 1318
  • [46] An Empirical Bayes Approach to Topic Modeling
    Gangopadhyay, Anirban
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 2803 - 2808
  • [47] An Approach for Analyzing Unstructured Text Data Using Topic Modeling Techniques for Efficient Information Extraction
    Zadgaonkar, Ashwini
    Agrawal, Avinash J.
    NEW GENERATION COMPUTING, 2024, 42 (01) : 109 - 134
  • [48] An Approach for Analyzing Unstructured Text Data Using Topic Modeling Techniques for Efficient Information Extraction
    Ashwini Zadgaonkar
    Avinash J. Agrawal
    New Generation Computing, 2024, 42 : 109 - 134
  • [49] Extractive text summarization using clustering-based topic modeling
    Belwal, Ramesh Chandra
    Rai, Sawan
    Gupta, Atul
    SOFT COMPUTING, 2023, 27 (07) : 3965 - 3982
  • [50] Web objectionable text content detection using topic modeling technique
    Duan, Jiangjiao
    Zeng, Jianping
    EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (15) : 6094 - 6104