Automatic Classification of Complaint Reports in Waste Management Systems Using TF-IDF, fastText, and BERT

被引:0
|
作者
Walkowiak, Tomasz [1 ]
Dabrowska, Alicja [2 ]
Giel, Robert [2 ]
Werbinska-Wojciechowska, Sylwia [2 ]
机构
[1] Wroclaw Univ Sci & Technol, Fac Informat & Commun Technol, Wroclaw, Poland
[2] Wroclaw Univ Sci & Technol, Fac Mech Engn, Wroclaw, Poland
关键词
Text classification; Complaint reports; Waste management; Word embedding; Language model; fastText; BERT;
D O I
10.1007/978-3-031-06746-4_36
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper concerns the issue of automatic text classification of complaint letters written in Polish that were sent to the municipal waste management system operating in one of the largest Polish cities. The problem analyzed regards a multi-class classification task with information source separation. The authors compare five approaches, starting from TF-IDF, through word2vec methods, and to transformer-based BERT models. The article includes a detailed analysis of the experiments performed and the data set used. The analysis was performed according to the stratified k-fold cross-validation with 10 folds. The classification results were analyzed using three measures: precision, average F1 score, and weighted F1 score. The results obtained confirm that the BERT-based approach outperforms the other approaches. Indeed, the HerBert large model is recommended for use in similar downstream tasks in Polish.
引用
收藏
页码:371 / 378
页数:8
相关论文
共 23 条
  • [1] Automatic Sarcasm Detection in Dialectal Arabic Using BERT and TF-IDF
    Mihi, Soukaina
    Ben Ali, Brahim Ait
    El Bazi, Ismail
    Arezki, Sara
    Laachfoubi, Nabil
    6TH INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS, 2022, 393 : 837 - 847
  • [2] Enhancing Depressive Post Detection in Bangla: A Comparative Study of TF-IDF, BERT and FastText Embeddings
    Sazan, Saad Ahmed
    Miraz, Mahdi H.
    Muntasir Rahman, A.B.M.
    Annals of Emerging Technologies in Computing, 2024, 8 (03) : 34 - 49
  • [3] Arabic Questions Classification Using Modified TF-IDF
    Alammary, Ali Saleh
    IEEE ACCESS, 2021, 9 : 95109 - 95122
  • [4] Using BERT and TF-IDF to Predict Entailment in Law-Based Queries
    Aydemir, Arman
    Souza, Pedro de Castro
    Gelfman, Andrew
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, JSAI-ISAI 2020, 2021, 12758 : 286 - 293
  • [5] Research paper classification systems based on TF-IDF and LDA schemes
    Kim, Sang-Woon
    Gil, Joon-Min
    HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2019, 9 (01)
  • [6] SENTIMENT CLASSIFICATION USING TF-IDF FEATURES AND EXTENDED SPACE FOREST ENSEMBLE
    Cao, Nieqing
    Cao, Jingjing
    Lu, Haili
    Li, Bing
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOL. 2, 2015, : 526 - 532
  • [7] Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports
    Jiang, Zhiying
    Gao, Bo
    He, Yanlin
    Han, Yongming
    Doyle, Paul
    Zhu, Qunxiong
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021
  • [8] TF-IDF combined rank factor Naive Bayesian algorithm for intelligent language classification recommendation systems
    Luo, Yonglian
    Lu, Cailin
    SYSTEMS AND SOFT COMPUTING, 2024, 6
  • [9] Applicability Analysis and Ensemble Application of BERT with TF-IDF, TextRank, MMR, and LDA for Topic Classification Based on Flood-Related VGI
    Du, Wenying
    Ge, Chang
    Yao, Shuang
    Chen, Nengcheng
    Xu, Lei
    ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2023, 12 (06)
  • [10] A study of damp-heat syndrome classification Using Word2vec and TF-IDF
    Zhu, Wei
    Zhang, Wei
    Li, Guo-Zheng
    He, Chong
    Zhang, Lei
    2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, : 1415 - 1420