Automatic Classification of Complaint Reports in Waste Management Systems Using TF-IDF, fastText, and BERT

被引:0
|
作者
Walkowiak, Tomasz [1 ]
Dabrowska, Alicja [2 ]
Giel, Robert [2 ]
Werbinska-Wojciechowska, Sylwia [2 ]
机构
[1] Wroclaw Univ Sci & Technol, Fac Informat & Commun Technol, Wroclaw, Poland
[2] Wroclaw Univ Sci & Technol, Fac Mech Engn, Wroclaw, Poland
关键词
Text classification; Complaint reports; Waste management; Word embedding; Language model; fastText; BERT;
D O I
10.1007/978-3-031-06746-4_36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper concerns the issue of automatic text classification of complaint letters written in Polish that were sent to the municipal waste management system operating in one of the largest Polish cities. The problem analyzed regards a multi-class classification task with information source separation. The authors compare five approaches, starting from TF-IDF, through word2vec methods, and to transformer-based BERT models. The article includes a detailed analysis of the experiments performed and the data set used. The analysis was performed according to the stratified k-fold cross-validation with 10 folds. The classification results were analyzed using three measures: precision, average F1 score, and weighted F1 score. The results obtained confirm that the BERT-based approach outperforms the other approaches. Indeed, the HerBert large model is recommended for use in similar downstream tasks in Polish.
引用
收藏
页码:371 / 378
页数:8
相关论文
共 23 条
  • [21] Smart Waste Management and Classification Systems Using Cutting Edge Approach
    Cheema, Sehrish Munawar
    Hannan, Abdul
    Pires, Ivan Miguel
    SUSTAINABILITY, 2022, 14 (16)
  • [22] Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
    Jia Li
    Yucong Lin
    Pengfei Zhao
    Wenjuan Liu
    Linkun Cai
    Jing Sun
    Lei Zhao
    Zhenghan Yang
    Hong Song
    Han Lv
    Zhenchang Wang
    BMC Medical Informatics and Decision Making, 22
  • [23] Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT)
    Li, Jia
    Lin, Yucong
    Zhao, Pengfei
    Liu, Wenjuan
    Cai, Linkun
    Sun, Jing
    Zhao, Lei
    Yang, Zhenghan
    Song, Hong
    Lv, Han
    Wang, Zhenchang
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)