Supervised topic models with weighted words: multi-label document classification

被引:0
|
作者
Yue-peng Zou
Ji-hong Ouyang
Xi-ming Li
机构
[1] Jilin University,College of Computer Science and Technology
[2] Jilin University,MOE Key Laboratory of Symbolic Computation and Knowledge Engineering
关键词
Supervised topic model; Multi-label classification; Class frequency; Labeled latent Dirichlet allocation (L-LDA); Dependency-LDA; TP391;
D O I
暂无
中图分类号
学科分类号
摘要
Supervised topic modeling algorithms have been successfully applied to multi-label document classification tasks. Representative models include labeled latent Dirichlet allocation (L-LDA) and dependency-LDA. However, these models neglect the class frequency information of words (i.e., the number of classes where a word has occurred in the training data), which is significant for classification. To address this, we propose a method, namely the class frequency weight (CF-weight), to weight words by considering the class frequency knowledge. This CF-weight is based on the intuition that a word with higher (lower) class frequency will be less (more) discriminative. In this study, the CF-weight is used to improve L-LDA and dependency-LDA. A number of experiments have been conducted on real-world multi-label datasets. Experimental results demonstrate that CF-weight based algorithms are competitive with the existing supervised topic models.
引用
收藏
页码:513 / 523
页数:10
相关论文
共 50 条
  • [21] Supervised representation learning for multi-label classification
    Huang, Ming
    Zhuang, Fuzhen
    Zhang, Xiao
    Ao, Xiang
    Niu, Zhengyu
    Zhang, Min-Ling
    He, Qing
    MACHINE LEARNING, 2019, 108 (05) : 747 - 763
  • [22] A Mixture Approach for Multi-Label Document Classification
    Tsai, Shian-Chi
    Jiang, Jung-Yi
    Lee, Shie-Jue
    INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE (TAAI 2010), 2010, : 387 - 391
  • [23] Exploiting Distance Graph and Hidden Topic Models for Multi-label Text Classification
    Thi-Ngan Pham
    Van-Hien Tran
    Tri-Thanh Nguyen
    Quang-Thuy Ha
    ADVANCED TOPICS IN INTELLIGENT INFORMATION AND DATABASE SYSTEMS, 2017, 710 : 321 - 331
  • [24] Multi-label dataless text classification with topic modeling
    Daochen Zha
    Chenliang Li
    Knowledge and Information Systems, 2019, 61 : 137 - 160
  • [25] Multi-label dataless text classification with topic modeling
    Zha, Daochen
    Li, Chenliang
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (01) : 137 - 160
  • [26] A Survey of Statistical Topic Model for Multi-label Classification
    Liu, Lin
    Tang, Lin
    2018 26TH INTERNATIONAL CONFERENCE ON GEOINFORMATICS (GEOINFORMATICS 2018), 2018,
  • [27] WiseTag: An Ensemble Method for Multi-label Topic Classification
    Liang, Guanqing
    Kao, Hsiaohsien
    Leung, Cane Wing-Ki
    He, Chao
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2018, PT II, 2018, 11109 : 479 - 489
  • [28] Centroid prior topic model for multi-label classification
    Li, Ximing
    Ouyang, Jihong
    Zhou, Xiaotang
    PATTERN RECOGNITION LETTERS, 2015, 62 : 8 - 13
  • [29] Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms
    Bromuri, Stefano
    Zufferey, Damien
    Hennebert, Jean
    Schumacher, Michael
    JOURNAL OF BIOMEDICAL INFORMATICS, 2014, 51 : 165 - 175
  • [30] Multi-label Classification Systems by the Use of Supervised Clustering
    Rastin, Niloofar
    Jahromi, Mansoor Zolghadri
    Taheri, Mohammad
    2017 19TH CSI INTERNATIONAL SYMPOSIUM ON ARTIFICIAL INTELLIGENCE AND SIGNAL PROCESSING (AISP), 2017, : 246 - 249