Dataless Text Classification with Pseudo Topic Representation

被引:0
|
作者
Yan, Rong [1 ]
Chen, Qi [1 ]
Gao, Guanglai [1 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, Inner Mongolia Key Lab Mongolian Informat Proc Te, Hohhot, Peoples R China
关键词
topic representation; latent dirichlet allocation; dataless text classification;
D O I
10.1109/ICTA150040.2020.00189
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As for an automatic text classification approach, a large body of research on latent-topic based Dataless Text Classification (DTC) has been emerged in recent years. Perusing the candidate seed words or guaranteeing the quality of the category-topics is the core mission of this approach. However, few previous studies consider the quality of specific categorytopics at the collection level instead at the document level, because not all topics are equally coherent or category sparsity. In this paper, we focus on alleviating the dilemma for the seed words selection problem in DTC by using pseudo text understanding. Differently from the existing latent-topic based DTC approach, we propose an unsupervised method named Pseudo Document Labeled Classification (PDLC). It extracts the most representative word list to capture the best latent semantic category-topic description. Experimental results indicate that our PDLC scheme achieves better classification accuracy without any labeled data or external resource.
引用
收藏
页码:1255 / 1259
页数:5
相关论文
共 50 条
  • [21] Topic Classification Based on Distributed Document Representation and Latent Topic Information
    Chen, Peixin
    Guo, Wu
    Wang, Qingnan
    Song, Yan
    2017 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC 2017), 2017, : 614 - 617
  • [22] Semantic Enrichment of Text Representation with Wikipedia for Text Classification
    Yamakawa, Hiroki
    Peng, Jing
    Feldman, Anna
    IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2010), 2010,
  • [23] Enhancing Summarization with Text Classification via Topic Consistency
    Liu, Jingzhou
    Yang, Yiming
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT III, 2021, 12977 : 661 - 676
  • [24] Topic Modeling for Interpretable Text Classification From EHRs
    Rijcken, Emil
    Kaymak, Uzay
    Scheepers, Floortje
    Mosteiro, Pablo
    Zervanou, Kalliopi
    Spruit, Marco
    FRONTIERS IN BIG DATA, 2022, 5
  • [25] SHORT TEXT CLASSIFICATION BASED ON LDA TOPIC MODEL
    Chen, Qiuxing
    Yao, Lixiu
    Yang, Jie
    PROCEEDINGS OF 2016 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), 2016, : 749 - 753
  • [26] Classification of Text Documents Based on a Probabilistic Topic Model
    Karpovich, S. N.
    Smirnov, A. V.
    Teslya, N. N.
    SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING, 2019, 46 (05) : 314 - 320
  • [27] News Text Classification Model Based on Topic Model
    Li, Zhenzhong
    Shang, Wenqian
    Yan, Menghan
    2016 IEEE/ACIS 15TH INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCE (ICIS), 2016, : 1197 - 1201
  • [28] Classification of Text Documents Based on a Probabilistic Topic Model
    S. N. Karpovich
    A. V. Smirnov
    N. N. Teslya
    Scientific and Technical Information Processing, 2019, 46 : 314 - 320
  • [29] Multi-topic aspects in clinical text classification
    Sasaki, Yutaka
    Rea, Brian
    Ananiadou, Sophia
    2007 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, PROCEEDINGS, 2007, : 62 - 67
  • [30] Topic Labeled Text Classification: A Weakly Supervised Approach
    Hingmire, Swapnil
    Chakraborti, Sutanu
    SIGIR'14: PROCEEDINGS OF THE 37TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2014, : 385 - 394