A semi-supervised method to generate a persian dataset for suggestion classification

被引:0
|
作者
Safari, Leila [1 ]
Mohammady, Zanyar [1 ]
机构
[1] Univ Zanjan, Dept Comp Engn, Zanjan 4537138791, Iran
关键词
Automatic classification of suggestions; Annotator; Neural networks; Pre-trained language model; Transformers;
D O I
10.1007/s10579-023-09688-7
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Suggestion mining has become a popular subject in the field of natural language processing (NLP) that is useful in areas like a service/product improvement. The purpose of this study is to provide an automated machine learning (ML) based approach to extract suggestions from Persian text. In this research, first, a novel two-step semi-supervised method has been proposed to generate a Persian dataset called ParsSugg, which is then used in the automatic classification of the user's suggestions. The first step is manual labeling of data based on a proposed guideline, followed by a data augmentation phase. In the second step, using pre-trained Persian Bidirectional Encoder Representations from Transformers (ParsBERT) as a classifier and the data from the previous step, more data were labeled. The performance of various ML models, including Support Vector Machine (SVM), Random Forest (RF), Convolutional Neural Networks (CNN), Long Short Term Memory (LSTM), and the ParsBERT language model has been examined on the generated dataset. The F-score value of 97.27 for ParsBERT and about 94.5 for SVM and CNN classifiers were obtained for the suggestion class which is a promising result as the first research on suggestion classification on Persian texts. Also, the proposed guideline can be used for other NLP tasks, and the generated dataset can be used in other suggestion classification tasks.
引用
收藏
页码:839 / 858
页数:20
相关论文
共 50 条
  • [1] INCREMENTALLY SEMI-SUPERVISED CLASSIFICATION OF ARTHRITIS INFLAMMATION ON A CLINICAL DATASET
    Aouad, Theodore
    Lopez-Medina, Clementina
    Martin-Peltier, Charlotte
    Bordner, Adrien
    Yang, Sisi
    Molto, Anna
    Dougados, Maxime
    Feydy, Antoine
    Talbot, Hugues
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 3351 - 3355
  • [2] Semi-supervised Persian font recognition
    Imani, Maryam Bahojb
    Keyvanpour, Mohamad Reza
    Azmi, Reza
    WORLD CONFERENCE ON INFORMATION TECHNOLOGY (WCIT-2010), 2011, 3
  • [3] Semi-supervised classification method for dynamic applications
    Mouchaweh, M. Sayed
    FUZZY SETS AND SYSTEMS, 2010, 161 (04) : 544 - 563
  • [4] Imbalanced Wafer Map Dataset Classification with Semi-Supervised Learning Method and Optimized Loss Function
    Huang, Jianchuan
    Lin, Kuo-Yi
    Xu, Jia
    Lili
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2022, : 1815 - 1819
  • [5] Graph Based Semi-supervised Learning Method for Imbalanced Dataset
    Zhang, Chenguang
    Zhang, Yan
    Zhang, Xiahuan
    MECHATRONICS ENGINEERING, COMPUTING AND INFORMATION TECHNOLOGY, 2014, 556-562 : 4040 - 4044
  • [6] A hybrid generative/discriminative method for semi-supervised classification
    Jiang, Zhen
    Zhang, Shiyong
    Zeng, Jianping
    KNOWLEDGE-BASED SYSTEMS, 2013, 37 : 137 - 145
  • [7] A semi-supervised associative classification method for POS tagging
    Rani P.
    Pudi V.
    Sharma D.M.
    International Journal of Data Science and Analytics, 2016, 1 (2) : 123 - 136
  • [8] A Semi-supervised Classification Method Using Hidden Features
    Wang, Xi
    Ji, Hongxia
    Yao, Jun
    Zhang, Ze
    Ma, Beizhi
    PROCEEDINGS OF 2018 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION, ELECTRONICS AND ELECTRICAL ENGINEERING (AUTEEE), 2018, : 24 - 29
  • [9] Semi-supervised classification method based on spectral clustering
    Chen, Xi
    Journal of Networks, 2014, 9 (02) : 384 - 392
  • [10] Semi-supervised classification trees
    Levatic, Jurica
    Ceci, Michelangelo
    Kocev, Dragi
    Dzeroski, Saso
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2017, 49 (03) : 461 - 486