UOR: Universal Backdoor Attacks on Pre-trained Language Models

被引:0
|
作者
Du, Wei [1 ]
Li, Peixuan [1 ]
Zhao, Haodong [1 ]
Ju, Tianjie [1 ]
Ren, Ge [1 ]
Liu, Gongshen [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Cyber Sci & Engn, Shanghai, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Task-agnostic and transferable backdoors implanted in pre-trained language models (PLMs) pose a severe security threat as they can be inherited to any downstream task. However, existing methods rely on manual selection of triggers and backdoor representations, hindering their effectiveness and universality across different PLMs or usage paradigms. In this paper, we propose a new backdoor attack method called UOR, which overcomes these limitations by turning manual selection into automatic optimization. Specifically, we design poisoned supervised contrastive learning, which can automatically learn more uniform and universal backdoor representations. This allows for more even coverage of the output space, thus hitting more labels in downstream tasks after fine-tuning. Furthermore, we utilize gradient search to select appropriate trigger words that can be adapted to different PLMs and vocabularies. Experiments show that UOR achieves better attack performance on various text classification tasks compared to manual methods. Moreover, we test on PLMs with different architectures, usage paradigms, and more challenging tasks, achieving higher scores for universality.
引用
收藏
页码:7865 / 7877
页数:13
相关论文
共 50 条
  • [11] Unveiling potential threats: backdoor attacks in single-cell pre-trained models
    Feng, Sicheng
    Li, Siyu
    Chen, Luonan
    Chen, Shengquan
    CELL DISCOVERY, 2024, 10 (01)
  • [12] Backdoor Pre-trained Models Can Transfer to All
    Shen, Lujia
    Ji, Shouling
    Zhang, Xuhong
    Li, Jinfeng
    Chen, Jing
    Shi, Jie
    Fang, Chengfang
    Yin, Jianwei
    Wang, Ting
    CCS '21: PROCEEDINGS OF THE 2021 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2021, : 3141 - 3158
  • [13] Universal Adversarial Perturbations for Vision-Language Pre-trained Models
    Zhang, Peng-Fei
    Huang, Zi
    Bai, Guangdong
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 862 - 871
  • [14] Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks (vol 20, pg 180, 2023)
    Zhang, Zhengyan
    Xiao, Guangxuan
    Li, Yongwei
    Lv, Tian
    Qi, Fanchao
    Liu, Zhiyuan
    Wang, Yasheng
    Jiang, Xin
    Sun, Maosong
    MACHINE INTELLIGENCE RESEARCH, 2024, 21 (06) : 1214 - 1214
  • [15] Weight Poisoning Attacks on Pre-trained Models
    Kurita, Keita
    Michel, Paul
    Neubig, Graham
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 2793 - 2806
  • [16] Pre-Trained Language Models and Their Applications
    Wang, Haifeng
    Li, Jiwei
    Wu, Hua
    Hovy, Eduard
    Sun, Yu
    ENGINEERING, 2023, 25 : 51 - 65
  • [17] BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning
    Jia, Jinyuan
    Liu, Yupei
    Gong, Neil Zhenqiang
    43RD IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2022), 2022, : 2043 - 2059
  • [18] Annotating Columns with Pre-trained Language Models
    Suhara, Yoshihiko
    Li, Jinfeng
    Li, Yuliang
    Zhang, Dan
    Demiralp, Cagatay
    Chen, Chen
    Tan, Wang-Chiew
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA (SIGMOD '22), 2022, : 1493 - 1503
  • [19] LaoPLM: Pre-trained Language Models for Lao
    Lin, Nankai
    Fu, Yingwen
    Yang, Ziyu
    Chen, Chuwei
    Jiang, Shengyi
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 6506 - 6512
  • [20] Knowledge Rumination for Pre-trained Language Models
    Yao, Yunzhi
    Wang, Peng
    Mao, Shengyu
    Tan, Chuanqi
    Huang, Fei
    Chen, Huajun
    Zhang, Ningyu
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3387 - 3404