Classification of Text Documents Based on a Probabilistic Topic Model

被引:0
|
作者
Karpovich, S. N. [1 ]
Smirnov, A. V. [2 ]
Teslya, N. N. [2 ]
机构
[1] Olymp Corp, Moscow 121205, Russia
[2] Russian Acad Sci SPIIRAS, St Petersburg Inst Informat & Automat, St Petersburg 199178, Russia
基金
俄罗斯基础研究基金会;
关键词
classification; binary classification; topic modeling; natural language processing; SUPPORT;
D O I
10.3103/S0147688219050034
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
An approach to text document classification that utilizes a probabilistic topic model, which is characterized by the fact that its training document set contains objects of only one class, is proposed. This approach makes it possible to identify positive samples (samples resembling the target class) in collections and streams of text documents. This article considers models created for solving the problems of text document classification and trained on samples of a single class, describes their key features. The Positive Example Based Learning-TM classification model is presented and a software prototype that implements it as a basis for classification of text documents is developed. Despite having no information about negative document samples, the model demonstrates a high level of classification accuracy that exceeds the performance of alternative approaches. The superiority of the Positive Example Based Learning-TM model with respect to the classification accuracy criterion when using a small training set is experimentally proven.
引用
收藏
页码:314 / 320
页数:7
相关论文
共 50 条
  • [31] A probabilistic topic model for event-based image classification and multi-label annotation
    Laib, Lakhdar
    Allili, Mohand Said
    Ait-Aoudia, Samy
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2019, 76 : 283 - 294
  • [32] Clustering-based topic modeling for biomedical documents extractive text summarization
    AbdelAziz, Nabil M.
    Ali, Aliaa A.
    Naguib, Soaad M.
    Fayed, Lamiaa S.
    JOURNAL OF SUPERCOMPUTING, 2025, 81 (01):
  • [33] A deep learning-based classification for topic detection of audiovisual documents
    Fourati, Manel
    Jedidi, Anis
    Gargouri, Faiez
    APPLIED INTELLIGENCE, 2023, 53 (08) : 8776 - 8798
  • [34] A Topic Model for Hierarchical Documents
    Yang, Yang
    Wang, Feifei
    Jiang, Fei
    Jin, Shuyuan
    Xu, Jin
    2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC 2016), 2016, : 118 - 126
  • [35] A deep learning-based classification for topic detection of audiovisual documents
    Manel Fourati
    Anis Jedidi
    Faiez Gargouri
    Applied Intelligence, 2023, 53 : 8776 - 8798
  • [36] Towards topic driven access to full text documents
    Caracciolo, C
    van Hage, W
    de Rijke, M
    RESEARCH AND ADVANCED TECHNOLOGY FOR DIGITAL LIBRARIES, 2004, 3232 : 495 - 500
  • [37] Short text topic modeling by exploring original documents
    Ximing Li
    Changchun Li
    Jinjin Chi
    Jihong Ouyang
    Knowledge and Information Systems, 2018, 56 : 443 - 462
  • [38] Short text topic modeling by exploring original documents
    Li, Ximing
    Li, Changchun
    Chi, Jinjin
    Ouyang, Jihong
    KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (02) : 443 - 462
  • [39] An automatic classification of text documents based on correlative association of words
    Agnihotri, Deepak
    Verma, Kesari
    Tripathi, Priyanka
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2018, 50 (03) : 549 - 572
  • [40] An Efficient Framework by Topic Model for Multi-label Text Classification
    Sun, Wei
    Ran, Xiangying
    Luo, Xiangyang
    Wang, Chongjun
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,