A semi-supervised method to generate a persian dataset for suggestion classification

被引:0
|
作者
Safari, Leila [1 ]
Mohammady, Zanyar [1 ]
机构
[1] Univ Zanjan, Dept Comp Engn, Zanjan 4537138791, Iran
关键词
Automatic classification of suggestions; Annotator; Neural networks; Pre-trained language model; Transformers;
D O I
10.1007/s10579-023-09688-7
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Suggestion mining has become a popular subject in the field of natural language processing (NLP) that is useful in areas like a service/product improvement. The purpose of this study is to provide an automated machine learning (ML) based approach to extract suggestions from Persian text. In this research, first, a novel two-step semi-supervised method has been proposed to generate a Persian dataset called ParsSugg, which is then used in the automatic classification of the user's suggestions. The first step is manual labeling of data based on a proposed guideline, followed by a data augmentation phase. In the second step, using pre-trained Persian Bidirectional Encoder Representations from Transformers (ParsBERT) as a classifier and the data from the previous step, more data were labeled. The performance of various ML models, including Support Vector Machine (SVM), Random Forest (RF), Convolutional Neural Networks (CNN), Long Short Term Memory (LSTM), and the ParsBERT language model has been examined on the generated dataset. The F-score value of 97.27 for ParsBERT and about 94.5 for SVM and CNN classifiers were obtained for the suggestion class which is a promising result as the first research on suggestion classification on Persian texts. Also, the proposed guideline can be used for other NLP tasks, and the generated dataset can be used in other suggestion classification tasks.
引用
收藏
页码:839 / 858
页数:20
相关论文
共 50 条
  • [41] Semi-Supervised Learning for Classification with Uncertainty
    Zhang, Rui
    Liu, Tong-bo
    Zheng, Ming-wen
    MATERIALS SCIENCE AND INFORMATION TECHNOLOGY, PTS 1-8, 2012, 433-440 : 3584 - 3590
  • [42] Regularized semi-supervised classification on manifold
    Zhao, LW
    Luo, SW
    Zhao, YC
    Liao, LZ
    Wang, ZH
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 20 - 29
  • [43] Manifold contraction for semi-supervised classification
    HU EnLiang 1
    2 School of Mathematics
    Science China(Information Sciences), 2010, 53 (06) : 1170 - 1187
  • [44] Semi-Supervised Classification on Evolutionary Data
    Jia, Yangqing
    Yan, Shuicheng
    Zhang, Changshui
    21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, 2009, : 1083 - 1088
  • [45] Semi-Supervised Classification with Cluster Ensemble
    Berikov, Vladimir
    Karaev, Nikita
    Tewari, Ankit
    2017 INTERNATIONAL MULTI-CONFERENCE ON ENGINEERING, COMPUTER AND INFORMATION SCIENCES (SIBIRCON), 2017, : 245 - 250
  • [46] An Exploration of Semi-supervised Text Classification
    Lien, Henrik
    Biermann, Daniel
    Palumbo, Fabrizio
    Goodwin, Morten
    ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EAAAI/EANN 2022, 2022, 1600 : 477 - 488
  • [47] Ant Based Semi-supervised Classification
    Halder, Anindya
    Ghosh, Susmita
    Ghosh, Ashish
    SWARM INTELLIGENCE, 2010, 6234 : 376 - +
  • [48] Semi-supervised Classification by Probabilistic Relaxation
    Martinez-Uso, Adolfo
    Pla, Filiberto
    Martinez Sotoca, Jose
    Anaya-Sanchez, Henry
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, 2011, 7042 : 331 - 338
  • [49] Semi-supervised Classification by Local Coordination
    Yang, Gelan
    Xu, Xue
    Yang, Gang
    Zhang, Jianming
    NEURAL INFORMATION PROCESSING: MODELS AND APPLICATIONS, PT II, 2010, 6444 : 517 - +
  • [50] Semi-supervised collaborative text classification
    Jin, Rong
    Wu, Ming
    Sukthankar, Rahul
    MACHINE LEARNING: ECML 2007, PROCEEDINGS, 2007, 4701 : 600 - +