Multimodal Emotion Captioning Using Large Language Model with Prompt Engineering

被引:0
|
作者
Xu, Yaoxun [1 ]
Zhou, Yixuan [1 ]
Cai, Yunrui [1 ]
Xie, Jingran [1 ]
Ye, Runchuan [1 ]
Wu, Zhiyong [1 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal Emotion Recognition; Large language Model; Emotion Caption; Prompt Engineering; RECOGNITION;
D O I
10.1145/3689092.3689403
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the challenges in MER 2024 by focusing on the Open Vocabulary (OV) task, which extends beyond traditional fixed label space for multimodal emotion recognition. The study emphasizes the use of Large Language Models (LLMs) to interpret and extract emotional information from multimodal inputs, complemented by speech transcription, speech emotion description, and video clues. The paper explores the integration of these features into a prompt fed into a pre-trained LLaMA3-8B model, utilizing prompt engineering to achieve satisfactory results without fine-tuning. This approach bridges the gap between speech, video and text data, leveraging the full potential of LLMs for open-ended emotion recognition tasks and introducing a solution to the field.
引用
收藏
页码:104 / 109
页数:6
相关论文
共 50 条
  • [31] On the legal implications of Large Language Model answers: A prompt engineering approach and a view beyond by exploiting Knowledge Graphs
    Hannah, George
    Sousa, Rita T.
    Dasoulas, Ioannis
    d'Amato, Claudia
    JOURNAL OF WEB SEMANTICS, 2025, 84
  • [32] Prompt engineering on leveraging large language models in generating response to InBasket messages
    Yan, Sherry
    Knapp, Wendi
    Leong, Andrew
    Kadkhodazadeh, Sarira
    Das, Souvik
    Jones, Veena G.
    Clark, Robert
    Grattendick, David
    Chen, Kevin
    Hladik, Lisa
    Fagan, Lawrence
    Chan, Albert
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (10) : 2263 - 2270
  • [33] Optimizing Large Language Models: A Deep Dive into Effective Prompt Engineering Techniques
    Son, Minjun
    Won, Yun-Jae
    Lee, Sungjin
    APPLIED SCIENCES-BASEL, 2025, 15 (03):
  • [34] MEMOBERT: PRE-TRAINING MODEL WITH PROMPT-BASED LEARNING FOR MULTIMODAL EMOTION RECOGNITION
    Zhao, Jinming
    Li, Ruichen
    Jin, Qin
    Wang, Xinchao
    Li, Haizhou
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4703 - 4707
  • [35] A medical multimodal large language model for future pandemics
    Liu, Fenglin
    Zhu, Tingting
    Wu, Xian
    Yang, Bang
    You, Chenyu
    Wang, Chenyang
    Lu, Lei
    Liu, Zhangdaihong
    Zheng, Yefeng
    Sun, Xu
    Yang, Yang
    Clifton, Lei
    Clifton, David A.
    NPJ DIGITAL MEDICINE, 2023, 6 (01)
  • [36] Dynamic text prompt joint multimodal features for accurate plant disease image captioning
    Liang, Fangfang
    Huang, Zilong
    Wang, Wenjian
    He, Zhenxue
    En, Qing
    VISUAL COMPUTER, 2024,
  • [37] A medical multimodal large language model for future pandemics
    Fenglin Liu
    Tingting Zhu
    Xian Wu
    Bang Yang
    Chenyu You
    Chenyang Wang
    Lei Lu
    Zhangdaihong Liu
    Yefeng Zheng
    Xu Sun
    Yang Yang
    Lei Clifton
    David A. Clifton
    npj Digital Medicine, 6
  • [38] Application of Vector Based Memory Prompt Engineering Using Personality Psychology in Language Model of Conservational Interfaces
    Daglarli, Evren
    Aribas, Erke
    32ND IEEE SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU 2024, 2024,
  • [39] Sentimental Visual Captioning using Multimodal Transformer
    Wu, Xinxiao
    Li, Tong
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2023, 131 (04) : 1073 - 1090
  • [40] Emotion Recognition in Conversation with Multi-step Prompting Using Large Language Model
    Hama, Kenta
    Otsuka, Atsushi
    Ishii, Ryo
    SOCIAL COMPUTING AND SOCIAL MEDIA, PT I, SCSM 2024, 2024, 14703 : 338 - 346