Dataless Text Classification with Pseudo Topic Representation

被引:0
|
作者
Yan, Rong [1 ]
Chen, Qi [1 ]
Gao, Guanglai [1 ]
机构
[1] Inner Mongolia Univ, Coll Comp Sci, Inner Mongolia Key Lab Mongolian Informat Proc Te, Hohhot, Peoples R China
关键词
topic representation; latent dirichlet allocation; dataless text classification;
D O I
10.1109/ICTA150040.2020.00189
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As for an automatic text classification approach, a large body of research on latent-topic based Dataless Text Classification (DTC) has been emerged in recent years. Perusing the candidate seed words or guaranteeing the quality of the category-topics is the core mission of this approach. However, few previous studies consider the quality of specific categorytopics at the collection level instead at the document level, because not all topics are equally coherent or category sparsity. In this paper, we focus on alleviating the dilemma for the seed words selection problem in DTC by using pseudo text understanding. Differently from the existing latent-topic based DTC approach, we propose an unsupervised method named Pseudo Document Labeled Classification (PDLC). It extracts the most representative word list to capture the best latent semantic category-topic description. Experimental results indicate that our PDLC scheme achieves better classification accuracy without any labeled data or external resource.
引用
收藏
页码:1255 / 1259
页数:5
相关论文
共 50 条
  • [1] Dataless Text Classification: A Topic Modeling Approach with Document Manifold
    Li, Ximing
    Li, Changchun
    Chi, Jinjin
    Ouyang, Jihong
    Li, Chenliang
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 973 - 982
  • [2] Multi-label dataless text classification with topic modeling
    Daochen Zha
    Chenliang Li
    Knowledge and Information Systems, 2019, 61 : 137 - 160
  • [3] Multi-label dataless text classification with topic modeling
    Zha, Daochen
    Li, Chenliang
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 137 - 160
  • [4] Dataless Short Text Classification Based on Biterm Topic Model and Word Embeddings
    Yang, Yi
    Wang, Hongan
    Zhu, Jiaqi
    Wu, Yunkun
    Jiang, Kailong
    Guo, Wenli
    Shi, Wandong
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 3969 - 3975
  • [5] On Dataless Hierarchical Text Classification
    Song, Yangqiu
    Roth, Dan
    PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2014, : 1579 - 1585
  • [6] Effective Seed-Guided Topic Labeling for Dataless Hierarchical Short Text Classification
    Yang, Yi
    Wang, Hongan
    Zhu, Jiaqi
    Shi, Wandong
    Guo, Wenli
    Zhang, Jiawen
    WEB ENGINEERING, ICWE 2021, 2021, 12706 : 271 - 285
  • [7] Dataless Text Classification with Descriptive LDA
    Chen, Xingyuan
    Xia, Yunqing
    Jin, Peng
    Carroll, John
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2224 - 2231
  • [8] Unsupervised Label Refinement Improves Dataless Text Classification
    Chu, Zewei
    Stratos, Karl
    Gimpel, Kevin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 4165 - 4178
  • [9] Semantic Representation in Text Classification Using Topic Signature Mapping
    Achananuparp, Palakorn
    Zhou, Xiaohua
    Hu, Xiaohua
    Zhang, Xiaodan
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 1034 - 1040
  • [10] TOPIC STRUCTURE REPRESENTATION AND TEXT RECALL
    LORCH, RF
    LORCH, EP
    JOURNAL OF EDUCATIONAL PSYCHOLOGY, 1985, 77 (02) : 137 - 148