CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models

被引:0
|
作者
He, Xinyu [1 ]
Hao, Fengrui [2 ]
Gu, Tianlong [2 ]
Chang, Liang [1 ]
机构
[1] Guilin Univ Elect Technol, Guilin, Guangxi, Peoples R China
[2] Jinan Univ, Engn Res Ctr Trustworthy AI, Guangzhou, Guangdong, Peoples R China
基金
中国国家自然科学基金;
关键词
Pre-trained language models; backdoor attacks; Chinese; character;
D O I
10.1145/3678007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Pre-trained language models (PLMs) aim to assist computers in various domains to provide natural and efficient language interaction and text processing capabilities. However, recent studies have shown that PLMs are highly vulnerable to malicious backdoor attacks, where triggers could be injected into the models to guide them to exhibit the expected behavior of the attackers. Unfortunately, existing research on backdoor attacks has mainly focused on English PLMs and paid less attention to Chinese PLMs. Moreover, these extant backdoor attacks do not work well against Chinese PLMs. In this article, we disclose the limitations of English backdoor attacks against Chinese PLMs, and propose the character-level backdoor attacks (CBAs) against the Chinese PLMs. Specifically, we first design three Chinese trigger generation strategies to ensure that the backdoor is effectively triggered while improving the effectiveness of the backdoor attacks. Then, based on the attacker's capabilities of accessing the training dataset, we develop trigger injection mechanisms with either the target label similarity or the masked language model, which select the most influential position and insert the trigger to maximize the stealth of backdoor attacks. Extensive experiments on three major natural language processing tasks in various Chinese PLMs and English PLMs demonstrate the effectiveness and stealthiness of our method. In addition, CBAs have very strong resistance against three state-of-the-art backdoor defense methods.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] UOR: Universal Backdoor Attacks on Pre-trained Language Models
    Du, Wei
    Li, Peixuan
    Zhao, Haodong
    Ju, Tianjie
    Ren, Ge
    Liu, Gongshen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: ACL 2024, 2024, : 7865 - 7877
  • [2] Character-Level Syntax Infusion in Pre-Trained Models for Chinese Semantic Role Labeling
    Wang, Yuxuan
    Lei, Zhilin
    Che, Wanxiang
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2021, 12 (12) : 3503 - 3515
  • [3] Aliasing Backdoor Attacks on Pre-trained Models
    Wei, Cheng'an
    Lee, Yeonjoon
    Chen, Kai
    Meng, Guozhu
    Lv, Peizhuo
    PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 2707 - 2724
  • [4] Character-Level Syntax Infusion in Pre-Trained Models for Chinese Semantic Role Labeling
    Yuxuan Wang
    Zhilin Lei
    Wanxiang Che
    International Journal of Machine Learning and Cybernetics, 2021, 12 : 3503 - 3515
  • [5] Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks
    Xi, Zhaohan
    Du, Tianyu
    Li, Changjiang
    Pang, Ren
    Ji, Shouling
    Chen, Jinghui
    Ma, Fenglong
    Wang, Ting
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Backdoor Attacks Against Transfer Learning With Pre-Trained Deep Learning Models
    Wang, Shuo
    Nepal, Surya
    Rudolph, Carsten
    Grobler, Marthie
    Chen, Shangyu
    Chen, Tianle
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (03) : 1526 - 1539
  • [7] Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
    Li, Linyang
    Song, Demin
    Li, Xiaonan
    Zeng, Jiehang
    Ma, Ruotian
    Qiu, Xipeng
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3023 - 3032
  • [8] Enhancing pre-trained language models with Chinese character morphological knowledge
    Zheng, Zhenzhong
    Wu, Xiaoming
    Liu, Xiangzhi
    INFORMATION PROCESSING & MANAGEMENT, 2025, 62 (01)
  • [9] Maximum Entropy Loss, the Silver Bullet Targeting Backdoor Attacks in Pre-trained Language Models
    Liu, Zhengxiao
    Shen, Bowen
    Lin, Zheng
    Wang, Fali
    Wang, Weiping
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3850 - 3868
  • [10] Multi-target Backdoor Attacks for Code Pre-trained Models
    Li, Yanzhou
    Liu, Shangqing
    Chen, Kangjie
    Xie, Xiaofei
    Zhang, Tianwei
    Liu, Yang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7236 - 7254