Eyes Closed, Safety on: Protecting Multimodal LLMs via Image-to-Text Transformation

被引:0
|
作者
Gou, Yunhao [1 ,2 ]
Chen, Kai [2 ]
Liu, Zhili [2 ,3 ]
Hong, Lanqing [3 ]
Xu, Hang [3 ]
Li, Zhenguo [3 ]
Yeung, Dit-Yan [2 ]
Kwok, James T. [2 ]
Zhang, Yu [1 ]
机构
[1] Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Clear Water Bay, Hong Kong, Peoples R China
[3] Huawei Noahs Ark Lab, Hong Kong, Peoples R China
来源
关键词
Multimodal LLMs; Safety; Image-to-Text Transformation;
D O I
10.1007/978-3-031-72643-9_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal large language models (MLLMs) have shown impressive reasoning abilities. However, they are also more vulnerable to jailbreak attacks than their LLM predecessors. Although still capable of detecting the unsafe responses, we observe that safety mechanisms of the pre-aligned LLMs in MLLMs can be easily bypassed with the introduction of image features. To construct robust MLLMs, we propose ECSO (Eyes Closed, Safety On), a novel training-free protecting approach that exploits the inherent safety awareness of MLLMs, and generates safer responses via adaptively transforming unsafe images into texts to activate the intrinsic safety mechanism of pre-aligned LLMs in MLLMs. Experiments on five state-of-the-art (SoTA) MLLMs demonstrate that ECSO enhances model safety significantly (e.g., 37.6% improvement on the MM-SafetyBench (SD+OCR) and 71.3% on VLSafe with LLaVA-1.5-7B), while consistently maintaining utility results on common MLLM benchmarks. Furthermore, we show that ECSO can be used as a data engine to generate supervised-finetuning (SFT) data for MLLM alignment without extra human intervention.
引用
收藏
页码:388 / 404
页数:17
相关论文
共 7 条
  • [1] Multimodal Causal Relations Enhanced CLIP for Image-to-Text Retrieval
    Feng, Wenjun
    Lin, Dazhen
    Cao, Donglin
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT I, 2024, 14425 : 210 - 221
  • [2] Image-to-Text Conversion and Aspect-Oriented Filtration for Multimodal Aspect-Based Sentiment Analysis
    Wang, Qianlong
    Xu, Hongling
    Wen, Zhiyuan
    Liang, Bin
    Yang, Min
    Qin, Bing
    Xu, Ruifeng
    IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, 2024, 15 (03) : 1264 - 1278
  • [3] Text-to-Image Retrieval Based on Incremental Association via Multimodal Hypernetworks
    Ha, Jung-Woo
    Lee, Beom-Jin
    Zhang, Byoung-Tak
    PROCEEDINGS 2012 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2012, : 3245 - 3250
  • [4] Image-text sentiment analysis via deep multimodal attentive fusion
    Huang, Feiran
    Zhang, Xiaoming
    Zhao, Zhonghua
    Xu, Jie
    Li, Zhoujun
    KNOWLEDGE-BASED SYSTEMS, 2019, 167 : 26 - 37
  • [5] Image-Text Multimodal Emotion Classification via Multi-View Attentional Network
    Yang, Xiaocui
    Feng, Shi
    Wang, Daling
    Zhang, Yifei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 4014 - 4026
  • [6] Enhancing Baidu Multimodal Advertisement with Chinese Text-to-Image Generation via Bilingual Alignment and Caption Synthesis
    Zhao, Kang
    Zhao, Xinyu
    Jin, Zhipeng
    Yang, Yi
    Tao, Wen
    Han, Cong
    Li, Shuanglong
    Liu, Lin
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2855 - 2859
  • [7] Image-text multimodal classification via cross-attention contextual transformer with modality-collaborative learning
    Shi, Qianyao
    Xu, Wanru
    Miao, Zhenjiang
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (04)