CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models

被引：0

作者：

He, Xinyu ^{[1
]}

Hao, Fengrui ^{[2
]}

Gu, Tianlong ^{[2
]}

Chang, Liang ^{[1
]}

机构：

[1] Guilin Univ Elect Technol, Guilin, Guangxi, Peoples R China

[2] Jinan Univ, Engn Res Ctr Trustworthy AI, Guangzhou, Guangdong, Peoples R China

来源：

ACM TRANSACTIONS ON PRIVACY AND SECURITY | 2024年 / 27卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Pre-trained language models; backdoor attacks; Chinese; character;

D O I：

10.1145/3678007

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Pre-trained language models (PLMs) aim to assist computers in various domains to provide natural and efficient language interaction and text processing capabilities. However, recent studies have shown that PLMs are highly vulnerable to malicious backdoor attacks, where triggers could be injected into the models to guide them to exhibit the expected behavior of the attackers. Unfortunately, existing research on backdoor attacks has mainly focused on English PLMs and paid less attention to Chinese PLMs. Moreover, these extant backdoor attacks do not work well against Chinese PLMs. In this article, we disclose the limitations of English backdoor attacks against Chinese PLMs, and propose the character-level backdoor attacks (CBAs) against the Chinese PLMs. Specifically, we first design three Chinese trigger generation strategies to ensure that the backdoor is effectively triggered while improving the effectiveness of the backdoor attacks. Then, based on the attacker's capabilities of accessing the training dataset, we develop trigger injection mechanisms with either the target label similarity or the masked language model, which select the most influential position and insert the trigger to maximize the stealth of backdoor attacks. Extensive experiments on three major natural language processing tasks in various Chinese PLMs and English PLMs demonstrate the effectiveness and stealthiness of our method. In addition, CBAs have very strong resistance against three state-of-the-art backdoor defense methods.

引用

页数：26

共 50 条

[21] BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning
Jia, Jinyuan
Liu, Yupei
Gong, Neil Zhenqiang
43RD IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2022), 2022, : 2043 - 2059
[22] Exploiting Word Semantics to Enrich Character Representations of Chinese Pre-trained Models
Li, Wenbiao
Sun, Rui
Wu, Yunfang
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT I, 2022, 13551 : 3 - 15
[23] Improving Braille-Chinese translation with jointly trained and pre-trained language models
Huang, Tianyuan
Su, Wei
Liu, Lei
Cai, Chuan
Yu, Hailong
Yuan, Yongna
DISPLAYS, 2024, 82
[24] Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks (vol 20, pg 180, 2023)
Zhang, Zhengyan
Xiao, Guangxuan
Li, Yongwei
Lv, Tian
Qi, Fanchao
Liu, Zhiyuan
Wang, Yasheng
Jiang, Xin
Sun, Maosong
MACHINE INTELLIGENCE RESEARCH, 2024, 21 (06) : 1214 - 1214
[25] Annotating Columns with Pre-trained Language Models
Suhara, Yoshihiko
Li, Jinfeng
Li, Yuliang
Zhang, Dan
Demiralp, Cagatay
Chen, Chen
Tan, Wang-Chiew
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503
[26] LaoPLM: Pre-trained Language Models for Lao
Lin, Nankai
Fu, Yingwen
Yang, Ziyu
Chen, Chuwei
Jiang, Shengyi
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6506 - 6512
[27] PhoBERT: Pre-trained language models for Vietnamese
Dat Quoc Nguyen
Anh Tuan Nguyen
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 1037 - 1042
[28] Deciphering Stereotypes in Pre-Trained Language Models
Ma, Weicheng
Scheible, Henry
Wang, Brian
Veeramachaneni, Goutham
Chowdhary, Pratim
Sung, Alan
Koulogeorge, Andrew
Wang, Lili
Yang, Diyi
Vosoughi, Soroush
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 11328 - 11345
[29] Knowledge Rumination for Pre-trained Language Models
Yao, Yunzhi
Wang, Peng
Mao, Shengyu
Tan, Chuanqi
Huang, Fei
Chen, Huajun
Zhang, Ningyu
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3387 - 3404
[30] HinPLMs: Pre-trained Language Models for Hindi
Huang, Xixuan
Lin, Nankai
Li, Kexin
Wang, Lianxi
Gan, Suifu
2021 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2021, : 241 - 246

← 1 2 3 4 5 →