Topic selection for text classification using ensemble topic modeling with grouping, scoring, and modeling approach

被引:1
|
作者
Voskergian, Daniel [1 ]
Jayousi, Rashid [2 ]
Yousef, Malik [3 ]
机构
[1] Al Quds Univ, Comp Engn Dept, Jerusalem, Palestine
[2] Al Quds Univ, Comp Sci Dept, Jerusalem, Palestine
[3] Zefat Acad Coll, Dept Informat Syst, Safed, Israel
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
Topic model; Topic selection; Feature Selection; Ensemble learning; Text classification; Machine learning;
D O I
10.1038/s41598-024-74022-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
TextNetTopics (Yousef et al. in Front Genet 13:893378, 2022. https://doi.org/10.3389/fgene.2022.893378) is a recently developed approach that performs text classification-based topics (a topic is a group of terms or words) extracted from a Latent Dirichlet Allocation topic modeling as features rather than individual words. Following this approach enables TextNetTopics to fulfill dimensionality reduction while preserving and embedding more thematic and semantic information into the text document representations. In this article, we introduced a novel approach, the Ensemble Topic Model for Topic Selection (ENTM-TS), an advancement of TextNetTopics. ENTM-TS integrates multiple topic models using the Grouping, Scoring, and Modeling approach, thereby mitigating the performance variability introduced by employing individual topic modeling methods within TextNetTopics. Additionally, we performed a thorough comparative study to evaluate TextNetTopics' performance using eleven state-of-the-art topic modeling algorithms. We used the extracted topics for each as input to the G component in the TextNetTopics tool to select the most compelling topic model regarding their predictive behavior for text classification. We conducted our comprehensive evaluation utilizing the Drug-Induced Liver Injury textual dataset from the CAMDA community and the WOS-5736 dataset. The experimental results show that the Latent Semantic Indexing provides comparable performance measures with fewer discriminative features when compared with other topic modeling methods. Moreover, our evaluation reveals that the performance of ENTM-TS surpasses or aligns with the optimal outcomes obtained from individual topic models across the two datasets, establishing it as a robust and effective enhancement in text classification tasks.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] SEMANT - Feature Group Selection Utilizing FastText-Based Semantic Word Grouping, Scoring, and Modeling Approach for Text Classification
    Voskergian, Daniel
    Bakir-Gungor, Burcu
    Yousef, Malik
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT II, DEXA 2024, 2024, 14911 : 69 - 75
  • [22] An ensemble clustering approach for topic discovery using implicit text segmentation
    Memon, Muhammad Qasim
    Lu, Yu
    Chen, Penghe
    Memon, Aasma
    Pathan, Muhammad Salman
    Zardari, Zulfiqar Ali
    JOURNAL OF INFORMATION SCIENCE, 2021, 47 (04) : 431 - 457
  • [23] Relational Biterm Topic Model: Short-Text Topic Modeling using Word Embeddings
    Li, Ximing
    Zhang, Ang
    Li, Changchun
    Guo, Lantian
    Wang, Wenting
    Ouyang, Jihong
    Computer Journal, 2019, 62 (03): : 359 - 372
  • [24] An extractive text summarization approach using tagged-LDA based topic modeling
    Rani, Ruby
    Lobiyal, D. K.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (03) : 3275 - 3305
  • [25] An extractive text summarization approach using tagged-LDA based topic modeling
    Ruby Rani
    D. K. Lobiyal
    Multimedia Tools and Applications, 2021, 80 : 3275 - 3305
  • [26] Automatic social media news classification: a topic modeling approach
    Amador, Daniel
    Gamboa-Venegas, Carlos
    Garcia, Ernesto
    Segura-Castillo, Andres
    TECNOLOGIA EN MARCHA, 2022, 35
  • [27] Statistical Topic Modeling for Urdu Text Articles
    Rehman, Anwar Ur
    Rehman, Zobia
    Akram, Junaid
    Ali, Waqar
    Shah, Munam Ali
    Salman, Muhammad
    2018 24TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND COMPUTING (ICAC' 18), 2018, : 62 - 67
  • [28] Abstract or Full-text in Topic Modeling?
    Tekin, Yasar
    Cosar, Ahmet
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [29] Hierarchical Topic Modeling for Urdu Text Articles
    Rehman, Anwar Ur
    Khan, Ali Haider
    Aftab, Mustansar
    Rehman, Zobia
    Shah, Munam Ali
    2019 25TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND COMPUTING (ICAC), 2019, : 464 - 469
  • [30] Text Segmentation with Topic Modeling and Entity Coherence
    John, Adebayo Kolawole
    Di Caro, Luigi
    Boella, Guido
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS 2016), 2017, 552 : 175 - 185