Automatic Classification of Complaint Reports in Waste Management Systems Using TF-IDF, fastText, and BERT

被引：0

作者：

Walkowiak, Tomasz ^{[1
]}

Dabrowska, Alicja ^{[2
]}

Giel, Robert ^{[2
]}

Werbinska-Wojciechowska, Sylwia ^{[2
]}

机构：

[1] Wroclaw Univ Sci & Technol, Fac Informat & Commun Technol, Wroclaw, Poland

[2] Wroclaw Univ Sci & Technol, Fac Mech Engn, Wroclaw, Poland

来源：

NEW ADVANCES IN DEPENDABILITY OF NETWORKS AND SYSTEMS, DEPCOS-RELCOMEX 2022 | 2022年 / 484卷

关键词：

Text classification; Complaint reports; Waste management; Word embedding; Language model; fastText; BERT;

D O I：

10.1007/978-3-031-06746-4_36

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The paper concerns the issue of automatic text classification of complaint letters written in Polish that were sent to the municipal waste management system operating in one of the largest Polish cities. The problem analyzed regards a multi-class classification task with information source separation. The authors compare five approaches, starting from TF-IDF, through word2vec methods, and to transformer-based BERT models. The article includes a detailed analysis of the experiments performed and the data set used. The analysis was performed according to the stratified k-fold cross-validation with 10 folds. The classification results were analyzed using three measures: precision, average F1 score, and weighted F1 score. The results obtained confirm that the BERT-based approach outperforms the other approaches. Indeed, the HerBert large model is recommended for use in similar downstream tasks in Polish.

引用

页码：371 / 378

页数：8

共 23 条

[21] Smart Waste Management and Classification Systems Using Cutting Edge Approach
Cheema, Sehrish Munawar
Hannan, Abdul
Pires, Ivan Miguel
SUSTAINABILITY, 2022, 14 (16)
[22] Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
Jia Li
Yucong Lin
Pengfei Zhao
Wenjuan Liu
Linkun Cai
Jing Sun
Lei Zhao
Zhenghan Yang
Hong Song
Han Lv
Zhenchang Wang
BMC Medical Informatics and Decision Making, 22
[23] Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
Li, Jia
Lin, Yucong
Zhao, Pengfei
Liu, Wenjuan
Cai, Linkun
Sun, Jing
Zhao, Lei
Yang, Zhenghan
Song, Hong
Lv, Han
Wang, Zhenchang
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)

← 1 2 3 →