Robust scientific text classification using prompt tuning based on data augmentation with L2 regularization

被引:7
|
作者
Shi, Shijun [1 ]
Hu, Kai [1 ]
Xie, Jie [2 ,3 ]
Guo, Ya [1 ]
Wu, Huayi [4 ]
机构
[1] Jiangnan Univ, Key Lab Adv Proc Control Light Ind, Minist Educ, Wuxi 214122, Peoples R China
[2] Nanjing Normal Univ, Sch Comp & Elect Informat, Nanjing 210023, Peoples R China
[3] Nanjing Normal Univ, Sch Artificial Intelligence, Nanjing 210023, Peoples R China
[4] Wuhan Univ, State Key Lab Informat Engn Surveying Mapping & Re, Wuhan 430079, Peoples R China
基金
中国国家自然科学基金;
关键词
Scientific text classification; Pre-training model; Prompt tuning; Data augmentation; Pairwise training; L2; regularization;
D O I
10.1016/j.ipm.2023.103531
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, the prompt tuning technique, which incorporates prompts into the input of the pretraining language model (like BERT, GPT), has shown promise in improving the performance of language models when facing limited annotated data. However, the equivalence of template semantics in learning is not related to the effect of prompts and the prompt tuning often exhibits unstable performance, which is more severe in the domain of the scientific domain. To address this challenge, we propose to enhance prompt tuning using data augmentation with L2 regularization. Namely, pairing-wise training for the pair of the original and transformed data is performed. Our experiments on two scientific text datasets (ACL-ARC and SciCite) demonstrate that our proposed method significantly improves both accuracy and robustness. By using 1000 samples out of 1688 in the ACL-ARC training set, our method achieved an F1 score 3.33% higher than the same model trained on all 1688-sample data. In the SciCite dataset, our method surpassed the same model with labeled data reduced by over 93%. Our method is also proved to have high robustness, reaching F1 scores from 1% to 8% higher than those models without our method after the Probability Weighted Word Saliency attack.
引用
收藏
页数:19
相关论文
共 50 条
  • [1] Group analysis of fMRI data using L1 and L2 regularization
    Overholser, Rosanna
    Xu, Ronghui
    STATISTICS AND ITS INTERFACE, 2015, 8 (03) : 379 - 390
  • [2] Tokenization-based data augmentation for text classification
    Prakrankamanant, Patawee
    Chuangsuwanich, Ekapol
    2022 19TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (JCSSE 2022), 2022,
  • [3] Assessment of data augmentation, dropout with L2 Regularization and differential privacy against membership inference attacks
    Ben Hamida, Sana
    Mrabet, Hichem
    Chaieb, Faten
    Jemai, Abderrazak
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 44455 - 44484
  • [4] Assessment of data augmentation, dropout with L2 Regularization and differential privacy against membership inference attacks
    Sana Ben Hamida
    Hichem Mrabet
    Faten Chaieb
    Abderrazak Jemai
    Multimedia Tools and Applications, 2024, 83 : 44455 - 44484
  • [5] Classification of Text Writing Proficiency of L2 Learners
    Biondi, Giulio
    Franzoni, Valentina
    Milani, Alfredo
    Santucci, Valentino
    COMPUTATIONAL SCIENCE AND ITS APPLICATIONS-ICCSA 2023 WORKSHOPS, PT I, 2023, 14104 : 15 - 28
  • [6] Robust Psoriasis Severity Classification by using Data Augmentation
    Moon, Cho-I
    Baek, Yoo Sang
    Choi, Min Hyung
    Lee, Onseok
    Transactions of the Korean Institute of Electrical Engineers, 2022, 71 (12): : 1841 - 1847
  • [7] Optimal Feature Selection for Robust Classification via l2,1-Norms Regularization
    Wen, Jiajun
    Lai, Zhihui
    Wong, Wai Keung
    Cui, Jinrong
    Wan, Minghua
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 517 - 521
  • [8] ICT, memory and comprehension of scientific text in French L2
    Ben Romdhane, D. Ben Ismail
    Legros, D.
    PSYCHOLOGIE FRANCAISE, 2017, 62 (03): : 279 - 292
  • [9] Robust classification using l2,1-norm based regression model
    Ren, Chuan-Xian
    Dai, Dao-Qing
    Yan, Hong
    PATTERN RECOGNITION, 2012, 45 (07) : 2708 - 2718
  • [10] Comparing Prompt-Based and Standard Fine-Tuning for Urdu Text Classification
    Ullah, Faizad
    Azam, Ubaid
    Faheem, Ali
    Kamiran, Faisal
    Karim, Asim
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 6747 - 6754