A semi-supervised method to generate a persian dataset for suggestion classification

被引:0
|
作者
Safari, Leila [1 ]
Mohammady, Zanyar [1 ]
机构
[1] Univ Zanjan, Dept Comp Engn, Zanjan 4537138791, Iran
关键词
Automatic classification of suggestions; Annotator; Neural networks; Pre-trained language model; Transformers;
D O I
10.1007/s10579-023-09688-7
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Suggestion mining has become a popular subject in the field of natural language processing (NLP) that is useful in areas like a service/product improvement. The purpose of this study is to provide an automated machine learning (ML) based approach to extract suggestions from Persian text. In this research, first, a novel two-step semi-supervised method has been proposed to generate a Persian dataset called ParsSugg, which is then used in the automatic classification of the user's suggestions. The first step is manual labeling of data based on a proposed guideline, followed by a data augmentation phase. In the second step, using pre-trained Persian Bidirectional Encoder Representations from Transformers (ParsBERT) as a classifier and the data from the previous step, more data were labeled. The performance of various ML models, including Support Vector Machine (SVM), Random Forest (RF), Convolutional Neural Networks (CNN), Long Short Term Memory (LSTM), and the ParsBERT language model has been examined on the generated dataset. The F-score value of 97.27 for ParsBERT and about 94.5 for SVM and CNN classifiers were obtained for the suggestion class which is a promising result as the first research on suggestion classification on Persian texts. Also, the proposed guideline can be used for other NLP tasks, and the generated dataset can be used in other suggestion classification tasks.
引用
收藏
页码:839 / 858
页数:20
相关论文
共 50 条
  • [31] Semi-supervised approach for Persian word sense disambiguation
    Mahmoodvand, Mohamadreza
    Hourali, Maryam
    PROCEEDINGS OF THE 2017 7TH INTERNATIONAL CONFERENCE ON COMPUTER AND KNOWLEDGE ENGINEERING (ICCKE), 2017, : 104 - 110
  • [32] Semi-Supervised Learning for ECG Classification
    Rodrigues, Rui
    Couto, Paula
    2021 COMPUTING IN CARDIOLOGY (CINC), 2021,
  • [33] Semi-Supervised Network Traffic Classification
    Erman, Jeffrey
    Mahanti, Anirban
    Arlitt, Martin
    Cohen, Ira
    Williamson, Carey
    SIGMETRICS'07: PROCEEDINGS OF THE 2007 INTERNATIONAL CONFERENCE ON MEASUREMENT & MODELING OF COMPUTER SYSTEMS, 2007, 35 (01): : 369 - 370
  • [34] Augmentation Learning for Semi-Supervised Classification
    Frommknecht, Tim
    Zipf, Pedro Alves
    Fan, Quanfu
    Shvetsova, Nina
    Kuehne, Hilde
    PATTERN RECOGNITION, DAGM GCPR 2022, 2022, 13485 : 85 - 98
  • [35] Inductive semi-supervised universum classification
    Wang, Yunyun, 1600, Binary Information Press (10):
  • [36] Semi-supervised classification using bridging
    Chan, Jason
    Koprinska, Irena
    Poon, Josiah
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2008, 17 (03) : 415 - 431
  • [37] Semi-supervised classification with privileged information
    Qi, Zhiquan
    Tian, Yingjie
    Niu, Lingfeng
    Wang, Bo
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2015, 6 (04) : 667 - 676
  • [38] Semi-supervised classification by discriminative regularization
    Wang, Jun
    Yao, Guangjun
    Yu, Guoxian
    APPLIED SOFT COMPUTING, 2017, 58 : 245 - 255
  • [39] Semi-Supervised Hierarchical Graph Classification
    Li, Jia
    Huang, Yongfeng
    Chang, Heng
    Rong, Yu
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (05) : 6265 - 6276
  • [40] Convex Multiview Semi-Supervised Classification
    Nie, Feiping
    Li, Jing
    Li, Xuelong
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2017, 26 (12) : 5718 - 5729