UOR: Universal Backdoor Attacks on Pre-trained Language Models

被引:0
|
作者
Du, Wei [1 ]
Li, Peixuan [1 ]
Zhao, Haodong [1 ]
Ju, Tianjie [1 ]
Ren, Ge [1 ]
Liu, Gongshen [1 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Cyber Sci & Engn, Shanghai, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Task-agnostic and transferable backdoors implanted in pre-trained language models (PLMs) pose a severe security threat as they can be inherited to any downstream task. However, existing methods rely on manual selection of triggers and backdoor representations, hindering their effectiveness and universality across different PLMs or usage paradigms. In this paper, we propose a new backdoor attack method called UOR, which overcomes these limitations by turning manual selection into automatic optimization. Specifically, we design poisoned supervised contrastive learning, which can automatically learn more uniform and universal backdoor representations. This allows for more even coverage of the output space, thus hitting more labels in downstream tasks after fine-tuning. Furthermore, we utilize gradient search to select appropriate trigger words that can be adapted to different PLMs and vocabularies. Experiments show that UOR achieves better attack performance on various text classification tasks compared to manual methods. Moreover, we test on PLMs with different architectures, usage paradigms, and more challenging tasks, achieving higher scores for universality.
引用
收藏
页码:7865 / 7877
页数:13
相关论文
共 50 条
  • [1] Aliasing Backdoor Attacks on Pre-trained Models
    Wei, Cheng'an
    Lee, Yeonjoon
    Chen, Kai
    Meng, Guozhu
    Lv, Peizhuo
    PROCEEDINGS OF THE 32ND USENIX SECURITY SYMPOSIUM, 2023, : 2707 - 2724
  • [2] Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
    Li, Linyang
    Song, Demin
    Li, Xiaonan
    Zeng, Jiehang
    Ma, Ruotian
    Qiu, Xipeng
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 3023 - 3032
  • [3] Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks
    Zhang, Zhengyan
    Xiao, Guangxuan
    Li, Yongwei
    Lv, Tian
    Qi, Fanchao
    Liu, Zhiyuan
    Wang, Yasheng
    Jiang, Xin
    Sun, Maosong
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (02) : 180 - 193
  • [4] CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models
    He, Xinyu
    Hao, Fengrui
    Gu, Tianlong
    Chang, Liang
    ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2024, 27 (03)
  • [5] Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks
    Xi, Zhaohan
    Du, Tianyu
    Li, Changjiang
    Pang, Ren
    Ji, Shouling
    Chen, Jinghui
    Ma, Fenglong
    Wang, Ting
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [6] Maximum Entropy Loss, the Silver Bullet Targeting Backdoor Attacks in Pre-trained Language Models
    Liu, Zhengxiao
    Shen, Bowen
    Lin, Zheng
    Wang, Fali
    Wang, Weiping
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 3850 - 3868
  • [7] Multi-target Backdoor Attacks for Code Pre-trained Models
    Li, Yanzhou
    Liu, Shangqing
    Chen, Kangjie
    Xie, Xiaofei
    Zhang, Tianwei
    Liu, Yang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7236 - 7254
  • [8] PPT: Backdoor Attacks on Pre-trained Models via Poisoned Prompt Tuning
    Du, Wei
    Zhao, Yichun
    Li, Boqun
    Liu, Gongshen
    Wang, Shilin
    PROCEEDINGS OF THE THIRTY-FIRST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2022, 2022, : 680 - 686
  • [9] Backdoor Attacks Against Transfer Learning With Pre-Trained Deep Learning Models
    Wang, Shuo
    Nepal, Surya
    Rudolph, Carsten
    Grobler, Marthie
    Chen, Shangyu
    Chen, Tianle
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2022, 15 (03) : 1526 - 1539
  • [10] Moderate-fitting as a Natural Backdoor Defender for Pre-trained Language Models
    Zhu, Biru
    Qin, Yujia
    Cui, Ganqu
    Chen, Yangyi
    Zhao, Weilin
    Fu, Chong
    Deng, Yangdong
    Liu, Zhiyuan
    Wang, Jingang
    Wu, Wei
    Sun, Maosong
    Gu, Ming
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,