Multi-label dataless text classification with topic modeling

被引:0
|
作者
Daochen Zha
Chenliang Li
机构
[1] Wuhan University,School of Computer Science
[2] Wuhan University,School of Cyber Science and Engineering
来源
关键词
Dataless text classification; Topic model; Multi-label text classification; Spike and slab prior;
D O I
暂无
中图分类号
学科分类号
摘要
Manually labeling documents is tedious and expensive, but it is essential for training a traditional text classifier. In recent years, a few dataless text classification techniques have been proposed to address this problem. However, existing works mainly center on single-label classification problems, that is, each document is restricted to belonging to a single category. In this paper, we propose a novel Seed-guided Multi-label Topic Model, named SMTM. With a few seed words relevant to each category, SMTM conducts multi-label classification for a collection of documents without any labeled document. In SMTM, each category is associated with a single category-topic which covers the meaning of the category. To accommodate with multi-label documents, we explicitly model the category sparsity in SMTM by using spike and slab prior and weak smoothing prior. That is, without using any threshold tuning, SMTM automatically selects the relevant categories for each document. To incorporate the supervision of the seed words, we propose a seed-guided biased GPU (i.e., generalized Pólya urn) sampling procedure to guide the topic inference of SMTM. Experiments on two public datasets show that SMTM achieves better classification accuracy than state-of-the-art alternatives and even outperforms supervised solutions in some scenarios.
引用
收藏
页码:137 / 160
页数:23
相关论文
共 50 条
  • [41] Hierarchical Multi-label Text Classification: Self-adaption Semantic Awareness Network Integrating Text Topic and Label Level Information
    Zhao, Rui
    Wei, Xiao
    Ding, Cong
    Chen, Yongqi
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, KSEM 2021, PT II, 2021, 12816 : 406 - 418
  • [42] Research on Multi-Classification and Multi-Label in Text Categorization
    Hua, Liu
    2009 INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS, VOL 2, PROCEEDINGS, 2009, : 86 - 89
  • [43] Multi-label text classification with an ensemble feature space
    Tandon, Kushagri
    Chatterjee, Niladri
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 42 (05) : 4425 - 4436
  • [44] Multi-label Classification with Clustering for Image and Text Categorization
    Nasierding, Gulisong
    Sajjanhar, Atul
    2013 6TH INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING (CISP), VOLS 1-3, 2013, : 869 - 874
  • [45] On the Value of Head Labels in Multi-Label Text Classification
    Wang, Haobo
    Peng, Cheng
    Dong, Hede
    Feng, Lei
    Liu, Weiwei
    Hu, Tianlei
    Chen, Ke
    Chen, Gang
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (05)
  • [46] Deep Learning for Extreme Multi-label Text Classification
    Liu, Jingzhou
    Chang, Wei-Cheng
    Wu, Yuexin
    Yang, Yiming
    SIGIR'17: PROCEEDINGS OF THE 40TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2017, : 115 - 124
  • [47] Multi-label legal text classification with BiLSTM and attention
    Enamoto, Liriam
    Santos, Andre R. A. S.
    Maia, Ricardo
    Weigang, Li
    Rocha Filho, Geraldo P.
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2022, 68 (04) : 369 - 378
  • [48] Multi-label Text Classification for Public Procurement in Spanish
    Navas-Loro, Maria
    Garijo, Daniel
    Corcho, Oscar
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2022, (69): : 73 - 82
  • [49] Review and Prospect of Multi-Label Text Classification Research
    Zhang, Wenfeng
    Xi, Xuefeng
    Cui, Zhiming
    Zou, Yichen
    Luan, Jinquan
    Computer Engineering and Applications, 2023, 59 (18) : 28 - 48
  • [50] Multi-label Classification of Cybersecurity Text with Distant Supervision
    Ishii, Masahiro
    Mori, Kento
    Kuwana, Ryoichi
    Matsuura, Satoshi
    PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AVAILABILITY, RELIABILITY AND SECURITY, ARES 2022, 2022,