Topic selection for text classification using ensemble topic modeling with grouping, scoring, and modeling approach

被引：1

作者：

Voskergian, Daniel ^{[1
]}

Jayousi, Rashid ^{[2
]}

Yousef, Malik ^{[3
]}

机构：

[1] Al Quds Univ, Comp Engn Dept, Jerusalem, Palestine

[2] Al Quds Univ, Comp Sci Dept, Jerusalem, Palestine

[3] Zefat Acad Coll, Dept Informat Syst, Safed, Israel

来源：

SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期

关键词：

Topic model; Topic selection; Feature Selection; Ensemble learning; Text classification; Machine learning;

D O I：

10.1038/s41598-024-74022-2

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

TextNetTopics (Yousef et al. in Front Genet 13:893378, 2022. https://doi.org/10.3389/fgene.2022.893378) is a recently developed approach that performs text classification-based topics (a topic is a group of terms or words) extracted from a Latent Dirichlet Allocation topic modeling as features rather than individual words. Following this approach enables TextNetTopics to fulfill dimensionality reduction while preserving and embedding more thematic and semantic information into the text document representations. In this article, we introduced a novel approach, the Ensemble Topic Model for Topic Selection (ENTM-TS), an advancement of TextNetTopics. ENTM-TS integrates multiple topic models using the Grouping, Scoring, and Modeling approach, thereby mitigating the performance variability introduced by employing individual topic modeling methods within TextNetTopics. Additionally, we performed a thorough comparative study to evaluate TextNetTopics' performance using eleven state-of-the-art topic modeling algorithms. We used the extracted topics for each as input to the G component in the TextNetTopics tool to select the most compelling topic model regarding their predictive behavior for text classification. We conducted our comprehensive evaluation utilizing the Drug-Induced Liver Injury textual dataset from the CAMDA community and the WOS-5736 dataset. The experimental results show that the Latent Semantic Indexing provides comparable performance measures with fewer discriminative features when compared with other topic modeling methods. Moreover, our evaluation reveals that the performance of ENTM-TS surpasses or aligns with the optimal outcomes obtained from individual topic models across the two datasets, establishing it as a robust and effective enhancement in text classification tasks.

引用

页数：19

共 50 条

[31] Semantic Text Alignment based on Topic Modeling
Le, Huong T.
Pham, Lam N.
Nguyen, Duy D.
Nguyen, Son V.
Nguyen, An N.
2016 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING & COMMUNICATION TECHNOLOGIES, RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2016, : 67 - 72
[32] Adaptive Topic Modeling for Detection Objectionable Text
Zeng, Jianping
Duan, Jiangjiao
Wu, Chengrong
2013 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCES ON WEB INTELLIGENCE (WI) AND INTELLIGENT AGENT TECHNOLOGIES (IAT), VOL 1, 2013, : 381 - 388
[33] STTM: A tool for short text topic modeling
Qiang, Jipeng
Li, Yun
Yuan, Yunhao
Liu, Wei
Wu, Xindong
arXiv, 2018,
[34] Sentiment-topic modeling in text mining
Lin, Chenghua
Ibeke, Ebuka
Wyner, Adam
Guerin, Frank
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 5 (05) : 246 - 254
[35] Topic Modeling of Large Scale Social Text
Wang, Jia-wen
Yang, Qun
2ND INTERNATIONAL CONFERENCE ON COMMUNICATIONS, INFORMATION MANAGEMENT AND NETWORK SECURITY (CIMNS 2017), 2017, : 237 - 242
[36] Gibbs-BERTopic: A Hybrid Approach for Short Text Topic Modeling
Zhu, Yan
Liu, Yueying
IEEE ACCESS, 2025, 13 : 49162 - 49173
[37] Topic Modeling as a Method of Educational Text Structuring
Sakhovskiy, Andrey
Tutubalina, Elena
Solovyev, Valery
Solnyshkina, Marina
2020 13TH INTERNATIONAL CONFERENCE ON DEVELOPMENTS IN ESYSTEMS ENGINEERING (DESE 2020), 2020, : 399 - 405
[38] A Hybrid approach using topic modeling and class-association rule mining for text classification: The case of malware detection
Kumar, B. Shravan
Ravi, Vadlamani
PROCEEDINGS OF 2018 IEEE 17TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC 2018), 2018, : 261 - 268
[39] Short Text Topic Modeling with Topic Distribution Quantization and Negative Sampling Decoder
Wu, Xiaobao
Li, Chunping
Zhu, Yan
Miao, Yishu
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1772 - 1782
[40] Topic modeling for OLAP on multidimensional text databases: Topic cube and its applications
Zhang, Duo
Zhai, ChengXiang
Han, Jiawei
Srivastava, Ashok
Oza, Nikunj
Statistical Analysis and Data Mining, 2009, 2 (5-6): : 378 - 395

← 1 2 3 4 5 →