Towards Defending against Adversarial Examples via Attack-Invariant Features

被引:0
|
作者
Zhou, Dawei [1 ,2 ]
Liu, Tongliang [2 ]
Han, Bo [3 ]
Wang, Nannan [1 ]
Peng, Chunlei [4 ]
Gao, Xinbo [5 ]
机构
[1] Xidian Univ, Sch Telecommun Engn, State Key Lab Integrated Serv Networks, Xian, Shaanxi, Peoples R China
[2] Univ Sydney, Sch Comp Sci, Trustworthy Machine Learning Lab, Sydney, NSW, Australia
[3] Hong Kong Baptist Univ, Dept Comp Sci, Hong Kong, Peoples R China
[4] Xidian Univ, State Key Lab Integrated Serv Networks, Sch Cyber Engn, Xian, Shaanxi, Peoples R China
[5] Chongqing Univ Posts & Telecommun, Chongqing Key Lab Image Cognit, Chongqing, Peoples R China
基金
澳大利亚研究理事会; 中国国家自然科学基金;
关键词
CORTEX;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep neural networks (DNNs) are vulnerable to adversarial noise. Their adversarial robustness can be improved by exploiting adversarial examples. However, given the continuously evolving attacks, models trained on seen types of adversarial examples generally cannot generalize well to unseen types of adversarial examples. To solve this problem, in this paper, we propose to remove adversarial noise by learning generalizable invariant features across attacks which maintain semantic classification information. Specifically, we introduce an adversarial feature learning mechanism to disentangle invariant features from adversarial noise. A normalization term has been proposed in the encoded space of the attack-invariant features to address the bias issue between the seen and unseen types of attacks. Empirical evaluations demonstrate that our method could provide better protection in comparison to previous state-of-the-art approaches, especially against unseen types of attacks and adaptive attacks.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] A Synergetic Attack against Neural Network Classifiers combining Backdoor and Adversarial Examples
    Liu, Guanxiong
    Khalil, Issa
    Khreishah, Abdallah
    Phan, NhatHai
    2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2021, : 834 - 846
  • [42] Attention-guided transformation-invariant attack for black-box adversarial examples
    Zhu, Jiaqi
    Dai, Feng
    Yu, Lingyun
    Xie, Hongtao
    Wang, Lidong
    Wu, Bo
    Zhang, Yongdong
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (05) : 3142 - 3165
  • [43] Defending ChatGPT against jailbreak attack via self-reminders
    Yueqi Xie
    Jingwei Yi
    Jiawei Shao
    Justin Curl
    Lingjuan Lyu
    Qifeng Chen
    Xing Xie
    Fangzhao Wu
    Nature Machine Intelligence, 2023, 5 : 1486 - 1496
  • [44] Towards the transferable audio adversarial attack via ensemble methods
    Guo, Feng
    Sun, Zheng
    Chen, Yuxuan
    Ju, Lei
    CYBERSECURITY, 2023, 6 (01)
  • [45] Defending ChatGPT against jailbreak attack via self-reminders
    Xie, Yueqi
    Yi, Jingwei
    Shao, Jiawei
    Curl, Justin
    Lyu, Lingjuan
    Chen, Qifeng
    Xie, Xing
    Wu, Fangzhao
    NATURE MACHINE INTELLIGENCE, 2023, 5 (12) : 1486 - 1496
  • [46] Towards robust DeepFake distortion attack via adversarial autoaugment
    Guo, Qi
    Pang, Shanmin
    Chen, Zhikai
    Guo, Qing
    NEUROCOMPUTING, 2025, 617
  • [47] Towards the transferable audio adversarial attack via ensemble methods
    Feng Guo
    Zheng Sun
    Yuxuan Chen
    Lei Ju
    Cybersecurity, 6
  • [48] Defending against Membership Inference Attacks in Federated learning via Adversarial Example
    Xie, Yuanyuan
    Chen, Bing
    Zhang, Jiale
    Wu, Di
    2021 17TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING (MSN 2021), 2021, : 153 - 160
  • [49] Defending Video Recognition Model Against Adversarial Perturbations via Defense Patterns
    Lee, Hong Joo
    Ro, Yong Man
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2024, 21 (04) : 4110 - 4121
  • [50] Defending against adversarial attacks on graph neural networks via similarity property
    Yao, Minghong
    Yu, Haizheng
    Bian, Hong
    AI COMMUNICATIONS, 2023, 36 (01) : 27 - 39