Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models

被引:0
|
作者
Yang, Hao [1 ]
Qu, Lizhen [1 ]
Shareghi, Ehsan [1 ]
Haffari, Gholamreza [1 ]
机构
[1] Department of Data Science & AI, Monash University, Australia
来源
关键词
Achilles' heel - Condition - Language model - Multi-modal information - Multimodal inputs - Multimodal models - Non-speech audio - Real-world - Red teaming - Text format;
D O I
暂无
中图分类号
学科分类号
摘要
39
引用
收藏
相关论文
共 50 条
  • [21] Accommodating Audio Modality in CLIP for Multimodal Processing
    Ruan, Ludan
    Hu, Anwen
    Song, Yuqing
    Zhang, Liang
    Zheng, Sipeng
    Jin, Qin
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 9641 - 9649
  • [22] Audio-visual integration in multimodal communication
    Chen, T
    Rao, RR
    PROCEEDINGS OF THE IEEE, 1998, 86 (05) : 837 - 852
  • [23] PAM: Prompting Audio-Language Models for Audio Quality Assessment
    Deshmukh, Soham
    Alharthi, Dareen
    Elizalde, Benjamin
    Gamper, Hannes
    Al Ismail, Mahmoud
    Singh, Rita
    Raj, Bhiksha
    Wang, Huaming
    INTERSPEECH 2024, 2024, : 3320 - 3324
  • [24] Multimodal Chinese Event Extraction on Text and Audio
    Zhang, Xinlang
    Wang, Zhongqing
    Li, Peifeng
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [25] Diffusion Models for Audio Restoration
    Lemercier, Jean-Marie
    Richter, Julius
    Welker, Simon
    Moliner, Eloi
    Vaelimaeki, Vesa
    Gerkmann, Timo
    IEEE SIGNAL PROCESSING MAGAZINE, 2024, 41 (06) : 72 - 84
  • [26] Compositional Models for Audio Processing
    Virtanen, Tuomas
    Gemmeke, Jort F.
    Raj, Bhiksha
    Smaragdis, Paris
    IEEE SIGNAL PROCESSING MAGAZINE, 2015, 32 (02) : 125 - 144
  • [27] Tracking multimodal cohesion in Audio Description: Examples from a Dutch audio-description corpus
    Reviers, Nina
    LINGUISTICA ANTVERPIENSIA NEW SERIES-THEMES IN TRANSLATION STUDIES, 2018, 17 : 22 - 35
  • [28] Multimodal Affect Models: An Investigation of Relative Salience of Audio and Visual Cues for Emotion Prediction
    Wu, Jingyao
    Dang, Ting
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathamby
    FRONTIERS IN COMPUTER SCIENCE, 2021, 3
  • [29] PCM-MULTIPLEXED AUDIO IN A LARGE AUDIO-VIDEO ROUTING SWITCHER
    BUTLER, RJ
    SMPTE JOURNAL, 1976, 85 (11): : 875 - 877
  • [30] Does Audio help in deep Audio-Visual Saliency prediction models?
    Agrawal, Ritvik
    Jyoti, Shreyank
    Girmaji, Rohit
    Sivaprasad, Sarath
    Gandhi, Vineet
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 48 - 56