Multimodal Emotion Captioning Using Large Language Model with Prompt Engineering

被引:0
|
作者
Xu, Yaoxun [1 ]
Zhou, Yixuan [1 ]
Cai, Yunrui [1 ]
Xie, Jingran [1 ]
Ye, Runchuan [1 ]
Wu, Zhiyong [1 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal Emotion Recognition; Large language Model; Emotion Caption; Prompt Engineering; RECOGNITION;
D O I
10.1145/3689092.3689403
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the challenges in MER 2024 by focusing on the Open Vocabulary (OV) task, which extends beyond traditional fixed label space for multimodal emotion recognition. The study emphasizes the use of Large Language Models (LLMs) to interpret and extract emotional information from multimodal inputs, complemented by speech transcription, speech emotion description, and video clues. The paper explores the integration of these features into a prompt fed into a pre-trained LLaMA3-8B model, utilizing prompt engineering to achieve satisfactory results without fine-tuning. This approach bridges the gap between speech, video and text data, leveraging the full potential of LLMs for open-ended emotion recognition tasks and introducing a solution to the field.
引用
收藏
页码:104 / 109
页数:6
相关论文
共 50 条
  • [1] SECap: Speech Emotion Captioning with Large Language Model
    Xu, Yaoxun
    Chen, Hangting
    Yu, Jianwei
    Huang, Qiaochu
    Wu, Zhiyong
    Zhang, Shi-Xiong
    Li, Guangzhi
    Luo, Yi
    Gu, Rongzhi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 17, 2024, : 19323 - 19331
  • [2] Multimodal Speech Emotion Recognition Based on Large Language Model
    Fang, Congcong
    Jin, Yun
    Chen, Guanlin
    Zhang, Yunfan
    Li, Shidang
    Ma, Yong
    Xie, Yue
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (11) : 1463 - 1467
  • [3] Emotion Recognition from Videos Using Multimodal Large Language Models
    Vaiani, Lorenzo
    Cagliero, Luca
    Garza, Paolo
    FUTURE INTERNET, 2024, 16 (07)
  • [4] A Study on Performance Improvement of Prompt Engineering for Generative AI with a Large Language Model
    Park, Daeseung
    An, Gi-taek
    Kamyod, Chayapol
    Kim, Cheong Ghil
    JOURNAL OF WEB ENGINEERING, 2023, 22 (08): : 1187 - 1206
  • [5] Prompt engineering to inform large language model in automated building energy modeling
    Jiang, Gang
    Ma, Zhihao
    Zhang, Liang
    Chen, Jianli
    ENERGY, 2025, 316
  • [6] WORDFLOW: Social Prompt Engineering for Large Language Models
    Wang, Zijie J.
    Chakravarthy, Aishwarya
    Munechika, David
    Chau, Duen Horng
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 3: SYSTEM DEMONSTRATIONS, 2024, : 42 - 50
  • [7] Artificial intelligence for health message generation: an empirical study using a large language model (LLM) and prompt engineering
    Lim, Sue
    Schmalzle, Ralf
    FRONTIERS IN COMMUNICATION, 2023, 8
  • [8] Evaluation of prompt engineering strategies for pharmacokinetic data analysis with the ChatGPT large language model
    Shin, Euibeom
    Ramanathan, Murali
    JOURNAL OF PHARMACOKINETICS AND PHARMACODYNAMICS, 2024, 51 (02) : 101 - 108
  • [9] Evaluation of prompt engineering strategies for pharmacokinetic data analysis with the ChatGPT large language model
    Euibeom Shin
    Murali Ramanathan
    Journal of Pharmacokinetics and Pharmacodynamics, 2024, 51 : 101 - 108
  • [10] Knowledge graph construction for heart failure using large language models with prompt engineering
    Xu, Tianhan
    Gu, Yixun
    Xue, Mantian
    Gu, Renjie
    Li, Bin
    Gu, Xiang
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2024, 18