Dataless Text Classification with Pseudo Topic Representation

被引:0
|
作者
Yan, Rong [1 ]
Chen, Qi [1 ]
Gao, Guanglai [1 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, Inner Mongolia Key Lab Mongolian Informat Proc Te, Hohhot, Peoples R China
关键词
topic representation; latent dirichlet allocation; dataless text classification;
D O I
10.1109/ICTA150040.2020.00189
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As for an automatic text classification approach, a large body of research on latent-topic based Dataless Text Classification (DTC) has been emerged in recent years. Perusing the candidate seed words or guaranteeing the quality of the category-topics is the core mission of this approach. However, few previous studies consider the quality of specific categorytopics at the collection level instead at the document level, because not all topics are equally coherent or category sparsity. In this paper, we focus on alleviating the dilemma for the seed words selection problem in DTC by using pseudo text understanding. Differently from the existing latent-topic based DTC approach, we propose an unsupervised method named Pseudo Document Labeled Classification (PDLC). It extracts the most representative word list to capture the best latent semantic category-topic description. Experimental results indicate that our PDLC scheme achieves better classification accuracy without any labeled data or external resource.
引用
收藏
页码:1255 / 1259
页数:5
相关论文
共 50 条
  • [41] Text Classification of Network Pyramid Scheme based on Topic Model
    Mu, Pengyu
    He, Jingsha
    Zhu, Nafei
    NLPIR 2019: 2019 3RD INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, 2019, : 15 - 19
  • [42] A Study on Topic Modeling for Feature Space Reduction in Text Classification
    Pfeifer, Daniel
    Leidner, Jochen L.
    FLEXIBLE QUERY ANSWERING SYSTEMS, 2019, 11529 : 403 - 412
  • [43] An Extension of Topic Models for Text Classification: a Term Weighting Approach
    Lee, Seonggyu
    Kim, Jinho
    Myaeng, Sung-Hyon
    2015 INTERNATIONAL CONFERENCE ON BIG DATA AND SMART COMPUTING (BIGCOMP), 2015, : 217 - 224
  • [44] Automatic Topic Identification and Classification of Text Messages in the SMSALL System
    Pervaiz, Fahad
    Subramanian, Lakshmi
    Saif, Umar
    PROCEEDINGS OF THE 2ND ACM SYMPOSIUM ON COMPUTING FOR DEVELOPMENT (ACM DEV 2012), 2012,
  • [45] Topic document model approach for naive Bayes text classification
    Kim, SB
    Rim, HC
    Kim, JD
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (05): : 1091 - 1094
  • [46] Short text classification using semantically enriched topic model
    Uddin, Farid
    Chen, Yibo
    Zhang, Zuping
    Huang, Xin
    JOURNAL OF INFORMATION SCIENCE, 2024,
  • [47] The Application Research of Topic Word List In Text Automatic Classification
    Huang, Huan
    Liu, Qingtang
    Wu, Linjing
    Huang, Tao
    Yuan, Shuai
    2009 SECOND INTERNATIONAL SYMPOSIUM ON KNOWLEDGE ACQUISITION AND MODELING: KAM 2009, VOL 2, 2009, : 111 - 114
  • [48] TopicStriKer: A topic kernels-powered approach for text classification
    Chandran, Nikhil, V
    Anoop, V. S.
    Asharaf, S.
    RESULTS IN ENGINEERING, 2023, 17
  • [49] Finding structure in noisy text: topic classification and unsupervised clustering
    Prem Natarajan
    Rohit Prasad
    Krishna Subramanian
    Shirin Saleem
    Fred Choi
    Rich Schwartz
    International Journal of Document Analysis and Recognition (IJDAR), 2007, 10 : 187 - 198
  • [50] Learning Joint Topic Representation for Detecting Drift in Social Media Text
    Vijayarani, J.
    Geetha, T. V.
    INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2024, 32 (06) : 955 - 983