Classification of Text Documents Based on a Probabilistic Topic Model

被引:0
|
作者
Karpovich, S. N. [1 ]
Smirnov, A. V. [2 ]
Teslya, N. N. [2 ]
机构
[1] Olymp Corp, Moscow 121205, Russia
[2] Russian Acad Sci SPIIRAS, St Petersburg Inst Informat & Automat, St Petersburg 199178, Russia
基金
俄罗斯基础研究基金会;
关键词
classification; binary classification; topic modeling; natural language processing; SUPPORT;
D O I
10.3103/S0147688219050034
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
An approach to text document classification that utilizes a probabilistic topic model, which is characterized by the fact that its training document set contains objects of only one class, is proposed. This approach makes it possible to identify positive samples (samples resembling the target class) in collections and streams of text documents. This article considers models created for solving the problems of text document classification and trained on samples of a single class, describes their key features. The Positive Example Based Learning-TM classification model is presented and a software prototype that implements it as a basis for classification of text documents is developed. Despite having no information about negative document samples, the model demonstrates a high level of classification accuracy that exceeds the performance of alternative approaches. The superiority of the Positive Example Based Learning-TM model with respect to the classification accuracy criterion when using a small training set is experimentally proven.
引用
收藏
页码:314 / 320
页数:7
相关论文
共 50 条
  • [21] Text Categorization Based on Topic Model
    School of Computer Science and Technology, China University of Mining and Technology, Jiangsu Province, Xuzhou
    221116, China
    不详
    100081, China
    Int. J. Comput. Intell. Syst., 2009, 4 (398-409): : 398 - 409
  • [22] TextNetTopics Pro, a topic model-based text classification for short text by integration of semantic and document-topic distribution information
    Voskergian, Daniel
    Bakir-Gungor, Burcu
    Yousef, Malik
    FRONTIERS IN GENETICS, 2023, 14
  • [23] MULTILAYER ADAPTIVE FUZZY PROBABILISTIC NEURAL NETWORK IN CLASSIFICATION PROBLEMS OF TEXT DOCUMENTS
    Bodyanskiy, E. V.
    Ryabova, N. V.
    Zolotukhin, O. V.
    RADIO ELECTRONICS COMPUTER SCIENCE CONTROL, 2015, 1 : 39 - 45
  • [24] Augmenting Labeled Probabilistic Topic Model for Web Service Classification
    Pang, Shengye
    Zou, Guobing
    Gan, Yanglan
    Niu, Sen
    Zhang, Bofeng
    INTERNATIONAL JOURNAL OF WEB SERVICES RESEARCH, 2019, 16 (01) : 93 - 113
  • [25] Authorship attribution based on a probabilistic topic model
    Savoy, Jacques
    INFORMATION PROCESSING & MANAGEMENT, 2013, 49 (01) : 341 - 354
  • [26] Topic document model approach for naive Bayes text classification
    Kim, SB
    Rim, HC
    Kim, JD
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (05): : 1091 - 1094
  • [27] Short text classification using semantically enriched topic model
    Uddin, Farid
    Chen, Yibo
    Zhang, Zuping
    Huang, Xin
    JOURNAL OF INFORMATION SCIENCE, 2024,
  • [28] A probabilistic model for classification of multiple-record Web documents
    Tang, J
    Ng, YK
    OOIS 2000: 6TH INTERNATIONAL CONFERENCE ON OBJECT ORIENTED INFORMATION SYSTEMS, PROCEEDINGS, 2001, : 349 - 357
  • [29] Probabilistic topic modeling for short text based on word embedding networks
    Pita, Marcelo
    Nunes, Matheus
    Pappa, Gisele L.
    APPLIED INTELLIGENCE, 2022, 52 (15) : 17829 - 17844
  • [30] Probabilistic topic modeling for short text based on word embedding networks
    Marcelo Pita
    Matheus Nunes
    Gisele L. Pappa
    Applied Intelligence, 2022, 52 : 17829 - 17844