Joint Representations of Texts and Labels with Compositional Loss for Short Text Classification

被引：3

作者：

Hao, Ming ^{[1
]}

Wang, Weijing ^{[2
]}

Zhou, Fang ^{[1
]}

机构：

[1] Univ Sci & Technol Beijing, Sch Comp & Commun Engn, Beijing 100083, Peoples R China

[2] Univ Illinois, Dept Bioengn, Urbana, IL 61801 USA

来源：

JOURNAL OF WEB ENGINEERING | 2021年 / 20卷 / 03期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Ambiguous text; deep language models; label embedding; text classification; triplet loss;

D O I：

10.13052/jwe1540-9589.2035

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Short text classification is an important foundation for natural language processing (NLP) tasks. Though, the text classification based on deep language models (DLMs) has made a significant headway, in practical applications however, some texts are ambiguous and hard to classify in multi-class classification especially, for short texts whose context length is limited. The mainstream method improves the distinction of ambiguous text by adding context information. However, these methods rely only the text representation, and ignore that the categories overlap and are not completely independent of each other. In this paper, we establish a new general method to solve the problem of ambiguous text classification by introducing label embedding to represent each category, which makes measurable difference between the categories. Further, a new compositional loss function is proposed to train the model, which makes the text representation closer to the ground-truth label and farther away from others. Finally, a constraint is obtained by calculating the similarity between the text representation and label embedding. Errors caused by ambiguous text can be corrected by adding constraints to the output layer of the model. We apply the method to three classical models and conduct experiments on six public datasets. Experiments show that our method can effectively improve the classification accuracy of the ambiguous texts. In addition, combining our method with BERT, we obtain the state-of-the-art results on the CNT dataset.

引用

页码：669 / 687

页数：19

共 50 条

[21] Joint Embedding of Words and Labels for Sentiment Classification
Sheng, Yingwei
Takashi, Inui
2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020), 2020, : 264 - 269
[22] ABOUT DIGITAL TEXTS SIGNIFICANCE IN TEXT CLASSIFICATION
Komleva, Elena, V
PROCEEDINGS OF THE PHILOLOGICAL READINGS (PHR 2019), 2020, 83 : 154 - 159
[23] How Many Labels? Determining the Number of Labels in Multi-Label Text Classification
Azarbonyad, Hosein
Marx, Maarten
EXPERIMENTAL IR MEETS MULTILINGUALITY, MULTIMODALITY, AND INTERACTION (CLEF 2019), 2019, 11696 : 156 - 163
[24] Integrating Rich Document Representations for Text Classification
Jiang, Suqi
Lewris, Jason
Voltmer, Michael
Wang, Hongning
2016 IEEE SYSTEMS AND INFORMATION ENGINEERING DESIGN SYMPOSIUM (SIEDS), 2016, : 303 - 308
[25] Label Representations in Modeling Classification as Text Generation
Chen, Xinyi
Xu, Jingxian
Wang, Alex
AACL-IJCNLP 2020: THE 1ST CONFERENCE OF THE ASIA-PACIFIC CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 10TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2020, : 153 - 157
[26] Joint Training Graph Neural Network for the Bidding Project Title Short Text Classification
Li, Shengnan
Wu, Xiaoming
Liu, Xiangzhi
Xue, Xuqiang
Yu, Yang
WEB AND BIG DATA, PT I, APWEB-WAIM 2023, 2024, 14331 : 252 - 267
[27] From Text Classification to Keyphrase Extraction for Short Text
Lee, Song-Eun
Kim, Kang-Min
Ryu, Woo-Jong
Park, Jemin
Lee, SangKeun
2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 1137 - 1142
[28] On the Value of Head Labels in Multi-Label Text Classification
Wang, Haobo
Peng, Cheng
Dong, Hede
Feng, Lei
Liu, Weiwei
Hu, Tianlei
Chen, Ke
Chen, Gang
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2024, 18 (05)
[29] Towards Robust Learning with Noisy and Pseudo Labels for Text Classification
Wen, Murtadha Ahmeda Bo
Ao, Luo
Pan, Shengfeng
Su, Jianlin
Cao, Xinxin
Liu, Yunfeng
INFORMATION SCIENCES, 2024, 661
[30] Enhanced Text Classification using Proxy Labels and Knowledge Distillation
Sukumaran, Rohan
Prabhu, Sumanth
Misra, Hemant
PROCEEDINGS OF THE 5TH JOINT INTERNATIONAL CONFERENCE ON DATA SCIENCE & MANAGEMENT OF DATA, CODS COMAD 2022, 2022, : 227 - 230

← 1 2 3 4 5 →