Joint Representations of Texts and Labels with Compositional Loss for Short Text Classification

被引:3
|
作者
Hao, Ming [1 ]
Wang, Weijing [2 ]
Zhou, Fang [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China
[2] Univ Illinois, Dept Bioengn, Urbana, IL 61801 USA
来源
JOURNAL OF WEB ENGINEERING | 2021年 / 20卷 / 03期
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Ambiguous text; deep language models; label embedding; text classification; triplet loss;
D O I
10.13052/jwe1540-9589.2035
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Short text classification is an important foundation for natural language processing (NLP) tasks. Though, the text classification based on deep language models (DLMs) has made a significant headway, in practical applications however, some texts are ambiguous and hard to classify in multi-class classification especially, for short texts whose context length is limited. The mainstream method improves the distinction of ambiguous text by adding context information. However, these methods rely only the text representation, and ignore that the categories overlap and are not completely independent of each other. In this paper, we establish a new general method to solve the problem of ambiguous text classification by introducing label embedding to represent each category, which makes measurable difference between the categories. Further, a new compositional loss function is proposed to train the model, which makes the text representation closer to the ground-truth label and farther away from others. Finally, a constraint is obtained by calculating the similarity between the text representation and label embedding. Errors caused by ambiguous text can be corrected by adding constraints to the output layer of the model. We apply the method to three classical models and conduct experiments on six public datasets. Experiments show that our method can effectively improve the classification accuracy of the ambiguous texts. In addition, combining our method with BERT, we obtain the state-of-the-art results on the CNT dataset.
引用
收藏
页码:669 / 687
页数:19
相关论文
共 50 条
  • [21] Joint Embedding of Words and Labels for Sentiment Classification
    Sheng, Yingwei
    Takashi, Inui
    2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 264 - 269
  • [22] ABOUT DIGITAL TEXTS SIGNIFICANCE IN TEXT CLASSIFICATION
    Komleva, Elena, V
    PROCEEDINGS OF THE PHILOLOGICAL READINGS (PHR 2019), 2020, 83 : 154 - 159
  • [23] How Many Labels? Determining the Number of Labels in Multi-Label Text Classification
    Azarbonyad, Hosein
    Marx, Maarten
    EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION (CLEF 2019), 2019, 11696 : 156 - 163
  • [24] Integrating Rich Document Representations for Text Classification
    Jiang, Suqi
    Lewris, Jason
    Voltmer, Michael
    Wang, Hongning
    2016 IEEE SYSTEMS AND INFORMATION ENGINEERING DESIGN SYMPOSIUM (SIEDS), 2016, : 303 - 308
  • [25] Label Representations in Modeling Classification as Text Generation
    Chen, Xinyi
    Xu, Jingxian
    Wang, Alex
    AACL-IJCNLP 2020: THE 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2020, : 153 - 157
  • [26] Joint Training Graph Neural Network for the Bidding Project Title Short Text Classification
    Li, Shengnan
    Wu, Xiaoming
    Liu, Xiangzhi
    Xue, Xuqiang
    Yu, Yang
    WEB AND BIG DATA, PT I, APWEB-WAIM 2023, 2024, 14331 : 252 - 267
  • [27] From Text Classification to Keyphrase Extraction for Short Text
    Lee, Song-Eun
    Kim, Kang-Min
    Ryu, Woo-Jong
    Park, Jemin
    Lee, SangKeun
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1137 - 1142
  • [28] On the Value of Head Labels in Multi-Label Text Classification
    Wang, Haobo
    Peng, Cheng
    Dong, Hede
    Feng, Lei
    Liu, Weiwei
    Hu, Tianlei
    Chen, Ke
    Chen, Gang
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (05)
  • [29] Towards Robust Learning with Noisy and Pseudo Labels for Text Classification
    Wen, Murtadha Ahmeda Bo
    Ao, Luo
    Pan, Shengfeng
    Su, Jianlin
    Cao, Xinxin
    Liu, Yunfeng
    INFORMATION SCIENCES, 2024, 661
  • [30] Enhanced Text Classification using Proxy Labels and Knowledge Distillation
    Sukumaran, Rohan
    Prabhu, Sumanth
    Misra, Hemant
    PROCEEDINGS OF THE 5TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA, CODS COMAD 2022, 2022, : 227 - 230