Multimodal Emotion Captioning Using Large Language Model with Prompt Engineering

被引:0
|
作者
Xu, Yaoxun [1 ]
Zhou, Yixuan [1 ]
Cai, Yunrui [1 ]
Xie, Jingran [1 ]
Ye, Runchuan [1 ]
Wu, Zhiyong [1 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal Emotion Recognition; Large language Model; Emotion Caption; Prompt Engineering; RECOGNITION;
D O I
10.1145/3689092.3689403
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the challenges in MER 2024 by focusing on the Open Vocabulary (OV) task, which extends beyond traditional fixed label space for multimodal emotion recognition. The study emphasizes the use of Large Language Models (LLMs) to interpret and extract emotional information from multimodal inputs, complemented by speech transcription, speech emotion description, and video clues. The paper explores the integration of these features into a prompt fed into a pre-trained LLaMA3-8B model, utilizing prompt engineering to achieve satisfactory results without fine-tuning. This approach bridges the gap between speech, video and text data, leveraging the full potential of LLMs for open-ended emotion recognition tasks and introducing a solution to the field.
引用
收藏
页码:104 / 109
页数:6
相关论文
共 50 条
  • [21] Prompting Change: Exploring Prompt Engineering in Large Language Model AI and Its Potential to Transform Education
    Cain, William
    TECHTRENDS, 2024, 68 (01) : 47 - 57
  • [22] Thangka Image Captioning Based on Semantic Concept Prompt and Multimodal Feature Optimization
    Hu, Wenjin
    Qiao, Lang
    Kang, Wendong
    Shi, Xinyue
    JOURNAL OF IMAGING, 2023, 9 (08)
  • [23] GPT4MTS: Prompt-Based Large Language Model for Multimodal Time-Series Forecasting
    Jia, Furong
    Wang, Kevin
    Zheng, Yixiang
    Cao, Defu
    Liu, Yan
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23343 - 23351
  • [24] Hierarchical & multimodal video captioning: Discovering and transferring multimodal knowledge for vision to language
    Liu, An-An
    Xu, Ning
    Wong, Yongkang
    Li, Junnan
    Su, Yu-Ting
    Kankanhalli, Mohan
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2017, 163 : 113 - 125
  • [25] Meta-Learning of Prompt Generation for Lightweight Prompt Engineering on Language-Model-as-a-Service
    Ha, Hyeonmin
    Lee, Jihye
    Han, Wookje
    Chun, Byung-Gon
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 2433 - 2445
  • [26] Prompt Engineering: Unleashing the Power of Large Language Models to Defend Against Social Engineering Attacks
    Nezer, Ahmed I.
    Nema, Bashar M.
    Salim, Wisam Makki
    Iraqi Journal for Computer Science and Mathematics, 2024, 5 (03): : 404 - 416
  • [27] Potato disease detection and prevention using multimodal AI and large language model
    Zhu, Hongfei
    Shi, Weiming
    Guo, Xinyu
    Lyu, Shiting
    Yang, Ranbing
    Han, Zhongzhi
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2025, 229
  • [28] Advanced Image Captioning Using Object Detectors and Large Language Models
    undefined Nikita Andriyanov
    undefined Vitaly Dementiev
    Pattern Recognition and Image Analysis, 2024, 34 (4) : 909 - 912
  • [29] UnstrPrompt: Large Language Model Prompt for Driving in Unstructured Scenarios
    Li, Yuchen
    Li, Luxi
    Wu, Zizhang
    Bing, Zhenshan
    Zhe, Xuanyuan
    Knoll, Alois Christian
    Chen, Long
    IEEE JOURNAL OF RADIO FREQUENCY IDENTIFICATION, 2024, 8 : 367 - 375
  • [30] AppPoet: Large language model based android malware detection via multi-view prompt engineering
    Zhao, Wenxiang
    Wu, Juntao
    Meng, Zhaoyi
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 262