Classification of Text Documents Based on a Probabilistic Topic Model

被引:0
|
作者
Karpovich, S. N. [1 ]
Smirnov, A. V. [2 ]
Teslya, N. N. [2 ]
机构
[1] Olymp Corp, Moscow 121205, Russia
[2] Russian Acad Sci SPIIRAS, St Petersburg Inst Informat & Automat, St Petersburg 199178, Russia
基金
俄罗斯基础研究基金会;
关键词
classification; binary classification; topic modeling; natural language processing; SUPPORT;
D O I
10.3103/S0147688219050034
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
An approach to text document classification that utilizes a probabilistic topic model, which is characterized by the fact that its training document set contains objects of only one class, is proposed. This approach makes it possible to identify positive samples (samples resembling the target class) in collections and streams of text documents. This article considers models created for solving the problems of text document classification and trained on samples of a single class, describes their key features. The Positive Example Based Learning-TM classification model is presented and a software prototype that implements it as a basis for classification of text documents is developed. Despite having no information about negative document samples, the model demonstrates a high level of classification accuracy that exceeds the performance of alternative approaches. The superiority of the Positive Example Based Learning-TM model with respect to the classification accuracy criterion when using a small training set is experimentally proven.
引用
收藏
页码:314 / 320
页数:7
相关论文
共 50 条
  • [1] Classification of Text Documents Based on a Probabilistic Topic Model
    S. N. Karpovich
    A. V. Smirnov
    N. N. Teslya
    Scientific and Technical Information Processing, 2019, 46 : 314 - 320
  • [2] News Text Classification Model Based on Topic Model
    Li, Zhenzhong
    Shang, Wenqian
    Yan, Menghan
    2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 1197 - 1201
  • [3] SHORT TEXT CLASSIFICATION BASED ON LDA TOPIC MODEL
    Chen, Qiuxing
    Yao, Lixiu
    Yang, Jie
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2016, : 749 - 753
  • [4] Indexing text documents based on topic identification
    Butarbutar, M
    McRoy, S
    STRING PROCESSING AND INFORMATION RETRIEVAL, PROCEEDINGS, 2004, 3246 : 113 - 124
  • [5] SPARSE TOPIC MODEL FOR TEXT CLASSIFICATION
    Liu, Tao
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 1916 - 1920
  • [6] Text Classification of Network Pyramid Scheme based on Topic Model
    Mu, Pengyu
    He, Jingsha
    Zhu, Nafei
    NLPIR 2019: 2019 3RD INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, 2019, : 15 - 19
  • [7] A classification-based summarisation model for summarising text documents
    Hannah, M.E. (hanmoses@yahoo.com), 1600, Inderscience Enterprises Ltd., 29, route de Pre-Bois, Case Postale 856, CH-1215 Geneva 15, CH-1215, Switzerland (06): : 3 - 4
  • [8] A probabilistic model for clustering text documents with multiple fields
    Zhu, Shanfeng
    Takigawa, Ichigaku
    Zhang, Shuqin
    Mamitsuka, Hiroshi
    ADVANCES IN INFORMATION RETRIEVAL, 2007, 4425 : 331 - +
  • [9] A hidden Markov model-based text classification of medical documents
    Yi, Kwan
    Beheshti, Jamshid
    JOURNAL OF INFORMATION SCIENCE, 2009, 35 (01) : 67 - 81
  • [10] Topic evolution based on the probabilistic topic model: a review
    Houkui Zhou
    Huimin Yu
    Roland Hu
    Frontiers of Computer Science, 2017, 11 : 786 - 802