Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models

被引:0
|
作者
Yang, Hao [1 ]
Qu, Lizhen [1 ]
Shareghi, Ehsan [1 ]
Haffari, Gholamreza [1 ]
机构
[1] Department of Data Science & AI, Monash University, Australia
来源
关键词
Achilles' heel - Condition - Language model - Multi-modal information - Multimodal inputs - Multimodal models - Non-speech audio - Real-world - Red teaming - Text format;
D O I
暂无
中图分类号
学科分类号
摘要
39
引用
收藏
相关论文
共 50 条
  • [1] Against The Achilles' Heel: A Survey on Red Teaming for Generative Models
    Lin, Lizhi
    Mu, Honglin
    Zhai, Zenan
    Wang, Minghan
    Wang, Yuxia
    Wang, Renxi
    Gao, Junjie
    Zhang, Yixuan
    Che, Wanxiang
    Baldwin, Timothy
    Han, Xudong
    Li, Haonan
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2025, 82 : 687 - 775
  • [2] Images are Achilles’ Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
    Li, Yifan
    Guo, Hangyu
    Zhou, Kun
    Zhao, Wayne Xin
    Wen, Ji-Rong
    arXiv,
  • [3] Images are Achilles' Heel of Alignment: Exploiting Visual Vulnerabilities for Jailbreaking Multimodal Large Language Models
    Li, Yifan
    Guo, Hangyu
    Zhou, Kun
    Zhou, Wayne Xin
    Wen, Ji-Rong
    COMPUTER VISION - ECCV 2024, PT LXXIII, 2025, 15131 : 174 - 189
  • [4] Audio in Multimodal Applications
    Rumsey, Francis
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2010, 58 (03): : 191 - 195
  • [5] Audio in multimodal applications
    Rumsey, Francis
    AES: Journal of the Audio Engineering Society, 2010, 58 (03): : 191 - 195
  • [6] Audio-LLM: Activating the Capabilities of Large Language Models to Comprehend Audio Data
    Tang, Dongting Chenchong
    Liu, Han
    ADVANCES IN NEURAL NETWORKS-ISNN 2024, 2024, 14827 : 133 - 142
  • [7] Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
    Liu, Jizhong
    Li, Gang
    Zhang, Junbo
    Dinkel, Heinrich
    Wang, Yongqing
    Yan, Zhiyong
    Wang, Yujun
    Bin Wang
    INTERSPEECH 2024, 2024, : 1135 - 1139
  • [8] TRAINING AUDIO CAPTIONING MODELS WITHOUT AUDIO
    Deshmukh, Soham
    Elizalde, Benjamin
    Emmanouilidou, Dimitra
    Raj, Bhiksha
    Singh, Rita
    Wang, Huaming
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 371 - 375
  • [9] Improving Audio Explanations Using Audio Language Models
    Akman, Alican
    Sun, Qiyang
    Schuller, Bjorn W.
    IEEE SIGNAL PROCESSING LETTERS, 2025, 32 : 741 - 745
  • [10] Multimodal audio guide for museums and exhibitions
    Gebbensleben, S
    Dittmann, J
    Vielhauer, C
    MULTIMEDIA ON MOBILE DEVICES II, 2006, 6074