Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models

被引:0
|
作者
Yang, Hao [1 ]
Qu, Lizhen [1 ]
Shareghi, Ehsan [1 ]
Haffari, Gholamreza [1 ]
机构
[1] Department of Data Science & AI, Monash University, Australia
来源
关键词
Achilles' heel - Condition - Language model - Multi-modal information - Multimodal inputs - Multimodal models - Non-speech audio - Real-world - Red teaming - Text format;
D O I
暂无
中图分类号
学科分类号
摘要
39
引用
收藏
相关论文
共 50 条
  • [41] Multimodal Music Mood Classification using Audio and Lyrics
    Laurier, Cyril
    Grivolla, Jens
    Herrera, Perfecto
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 688 - +
  • [42] Audio-Visual Learning for Multimodal Emotion Recognition
    Fan, Siyu
    Jing, Jianan
    Wang, Chongwen
    SYMMETRY-BASEL, 2025, 17 (03):
  • [43] Voice EHR: introducing multimodal audio data for health
    Anibal, James
    Huth, Hannah
    Li, Ming
    Hazen, Lindsey
    Daoud, Veronica
    Ebedes, Dominique
    Lam, Yen Minh
    Nguyen, Hang
    Hong, Phuc Vo
    Kleinman, Michael
    Ost, Shelley
    Jackson, Christopher
    Sprabery, Laura
    Elangovan, Cheran
    Krishnaiah, Balaji
    Akst, Lee
    Lina, Ioan
    Elyazar, Iqbal
    Ekawati, Lenny
    Jansen, Stefan
    Nduwayezu, Richard
    Garcia, Charisse
    Plum, Jeffrey
    Brenner, Jacqueline
    Song, Miranda
    Ricotta, Emily
    Clifton, David
    Thwaites, C. Louise
    Bensoussan, Yael
    Wood, Bradford
    FRONTIERS IN DIGITAL HEALTH, 2025, 6
  • [44] Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
    Lu, Jiasen
    Clark, Christopher
    Lee, Sangho
    Zhang, Zichen
    Khosla, Savya
    Marten, Ryan
    Hoiem, Derek
    Kembhavi, Aniruddha
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 26429 - 26445
  • [45] Multimodal Prediction of Alexithymia from Physiological and Audio Signals
    Filippou, Valeria
    Nicolaou, Mihalis A.
    Theodosiou, Nikolas
    Panayiotou, Georgia
    Contantinou, Elena
    Theodorou, Marios
    Panteli, Maria
    2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,
  • [46] Multimodal speaker identification with audio-video processing
    Yemez, Y
    Kanak, A
    Erzin, E
    Tekalp, AM
    2003 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL 3, PROCEEDINGS, 2003, : 5 - 8
  • [47] Multimodal Depression Recognition with Dynamic Visual and Audio Cues
    He, Lang
    Jiang, Dongmei
    Sahli, Hichem
    2015 INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION (ACII), 2015, : 260 - 266
  • [48] The role of respiration audio in multimodal analysis of movement qualities
    Vincenzo Lussu
    Radoslaw Niewiadomski
    Gualtiero Volpe
    Antonio Camurri
    Journal on Multimodal User Interfaces, 2020, 14 : 1 - 15
  • [49] Audio Visual Multimodal Classification of Bipolar Disorder Episodes
    Li, Yan
    Yang, Le
    Chen, Haifeng
    Jiang, Dongmei
    Sahli, Hichem
    2019 8TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS (ACIIW), 2019, : 115 - 120
  • [50] Haptic, Audio, and Visual: Multimodal Distribution for Interactive Games
    Gaudina, Marco
    Zappi, Victor
    Brogni, Andrea
    Caldwell, Darwin G.
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2012, 61 (11) : 3103 - 3111