CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models

被引:0
|
作者
He, Xinyu [1 ]
Hao, Fengrui [2 ]
Gu, Tianlong [2 ]
Chang, Liang [1 ]
机构
[1] Guilin Univ Elect Technol, Guilin, Guangxi, Peoples R China
[2] Jinan Univ, Engn Res Ctr Trustworthy AI, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Pre-trained language models; backdoor attacks; Chinese; character;
D O I
10.1145/3678007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Pre-trained language models (PLMs) aim to assist computers in various domains to provide natural and efficient language interaction and text processing capabilities. However, recent studies have shown that PLMs are highly vulnerable to malicious backdoor attacks, where triggers could be injected into the models to guide them to exhibit the expected behavior of the attackers. Unfortunately, existing research on backdoor attacks has mainly focused on English PLMs and paid less attention to Chinese PLMs. Moreover, these extant backdoor attacks do not work well against Chinese PLMs. In this article, we disclose the limitations of English backdoor attacks against Chinese PLMs, and propose the character-level backdoor attacks (CBAs) against the Chinese PLMs. Specifically, we first design three Chinese trigger generation strategies to ensure that the backdoor is effectively triggered while improving the effectiveness of the backdoor attacks. Then, based on the attacker's capabilities of accessing the training dataset, we develop trigger injection mechanisms with either the target label similarity or the masked language model, which select the most influential position and insert the trigger to maximize the stealth of backdoor attacks. Extensive experiments on three major natural language processing tasks in various Chinese PLMs and English PLMs demonstrate the effectiveness and stealthiness of our method. In addition, CBAs have very strong resistance against three state-of-the-art backdoor defense methods.
引用
收藏
页数:26
相关论文
共 50 条
  • [31] Evaluating Commonsense in Pre-Trained Language Models
    Zhou, Xuhui
    Zhang, Yue
    Cui, Leyang
    Huang, Dandan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9733 - 9740
  • [32] Knowledge Inheritance for Pre-trained Language Models
    Qin, Yujia
    Lin, Yankai
    Yi, Jing
    Zhang, Jiajie
    Han, Xu
    Zhang, Zhengyan
    Su, Yusheng
    Liu, Zhiyuan
    Li, Peng
    Sun, Maosong
    Zhou, Jie
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3921 - 3937
  • [33] Code Execution with Pre-trained Language Models
    Liu, Chenxiao
    Lu, Shuai
    Chen, Weizhu
    Jiang, Daxin
    Svyatkovskiy, Alexey
    Fu, Shengyu
    Sundaresan, Neel
    Duan, Nan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4984 - 4999
  • [34] Probing for Hyperbole in Pre-Trained Language Models
    Schneidermann, Nina Skovgaard
    Hershcovich, Daniel
    Pedersen, Bolette Sandford
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-SRW 2023, VOL 4, 2023, : 200 - 211
  • [35] Pre-trained language models in medicine: A survey *
    Luo, Xudong
    Deng, Zhiqi
    Yang, Binxia
    Luo, Michael Y.
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154
  • [36] CodeAttack: Code-Based Adversarial Attacks for Pre-trained Programming Language Models
    Jha, Akshita
    Reddy, Chandan K.
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 14892 - 14900
  • [37] A complex network approach to analyse pre-trained language models for ancient Chinese
    Zheng, Jianyu
    Xiao, Xin'ge
    ROYAL SOCIETY OPEN SCIENCE, 2024, 11 (05):
  • [38] LSTM Easy-first Dependency Parsing with Pre-trained Word Embeddings and Character-level Word Embeddings in Vietnamese
    Binh Duc Nguyen
    Kiet Van Nguyen
    Ngan Luu-Thuy Nguyen
    PROCEEDINGS OF 2018 10TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SYSTEMS ENGINEERING (KSE), 2018, : 187 - 192
  • [39] Diffusion Theory as a Scalpel: Detecting and Purifying Poisonous Dimensions in Pre-trained Language Models Caused by Backdoor or Bias
    Zhang, Zhiyuan
    Chen, Deli
    Zhou, Hao
    Meng, Fandong
    Zhou, Jie
    Sun, Xu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 2495 - 2517
  • [40] A Study of Pre-trained Language Models in Natural Language Processing
    Duan, Jiajia
    Zhao, Hui
    Zhou, Qian
    Qiu, Meikang
    Liu, Meiqin
    2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 116 - 121