Multimodal Emotion Captioning Using Large Language Model with Prompt Engineering

被引:0
|
作者
Xu, Yaoxun [1 ]
Zhou, Yixuan [1 ]
Cai, Yunrui [1 ]
Xie, Jingran [1 ]
Ye, Runchuan [1 ]
Wu, Zhiyong [1 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
Multimodal Emotion Recognition; Large language Model; Emotion Caption; Prompt Engineering; RECOGNITION;
D O I
10.1145/3689092.3689403
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper addresses the challenges in MER 2024 by focusing on the Open Vocabulary (OV) task, which extends beyond traditional fixed label space for multimodal emotion recognition. The study emphasizes the use of Large Language Models (LLMs) to interpret and extract emotional information from multimodal inputs, complemented by speech transcription, speech emotion description, and video clues. The paper explores the integration of these features into a prompt fed into a pre-trained LLaMA3-8B model, utilizing prompt engineering to achieve satisfactory results without fine-tuning. This approach bridges the gap between speech, video and text data, leveraging the full potential of LLMs for open-ended emotion recognition tasks and introducing a solution to the field.
引用
收藏
页码:104 / 109
页数:6
相关论文
共 50 条
  • [41] Learning to Prompt for Vision-Language Emotion Recognition
    Xie, Hongxia
    Chung, Hua
    Shuai, Hong-Han
    Cheng, Wen-Huang
    2023 11TH INTERNATIONAL CONFERENCE ON AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION WORKSHOPS AND DEMOS, ACIIW, 2023,
  • [42] Sentimental Visual Captioning using Multimodal Transformer
    Xinxiao Wu
    Tong Li
    International Journal of Computer Vision, 2023, 131 : 1073 - 1090
  • [43] Large language model to multimodal large language model: A journey to shape the biological macromolecules to biological sciences and medicine
    Bhattacharya, Manojit
    Pal, Soumen
    Chatterjee, Srijan
    Lee, Sang -Soo
    Chakraborty, Chiranjib
    MOLECULAR THERAPY NUCLEIC ACIDS, 2024, 35 (03):
  • [44] DanceCaps: Pseudo-Captioning for Dance Videos Using Large Language Models
    Kim, Seohyun
    Lee, Kyogu
    APPLIED SCIENCES-BASEL, 2024, 14 (22):
  • [45] Mitigating spatial hallucination in large language models for path planning via prompt engineering
    Zhang, Hongjie
    Deng, Hourui
    Ou, Jie
    Feng, Chaosheng
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [46] Improving large language models for clinical named entity recognition via prompt engineering
    Hu, Yan
    Chen, Qingyu
    Du, Jingcheng
    Peng, Xueqing
    Keloth, Vipina Kuttichi
    Zuo, Xu
    Zhou, Yujia
    Li, Zehan
    Jiang, Xiaoqian
    Lu, Zhiyong
    Roberts, Kirk
    Xu, Hua
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2024, 31 (09)
  • [47] How to use large language models in ophthalmology: from prompt engineering to protecting confidentiality
    Kleinig, Oliver
    Gao, Christina
    Kovoor, Joshua G.
    Gupta, Aashray K.
    Bacchi, Stephen
    Chan, Weng Onn
    EYE, 2024, 38 (04) : 649 - 653
  • [48] How to use large language models in ophthalmology: from prompt engineering to protecting confidentiality
    Oliver Kleinig
    Christina Gao
    Joshua G. Kovoor
    Aashray K. Gupta
    Stephen Bacchi
    Weng Onn Chan
    Eye, 2024, 38 : 649 - 653
  • [49] Prompt Optimization in Large Language Models
    Sabbatella, Antonio
    Ponti, Andrea
    Giordani, Ilaria
    Candelieri, Antonio
    Archetti, Francesco
    MATHEMATICS, 2024, 12 (06)
  • [50] Leveraging language model for advanced multiproperty molecular optimization via prompt engineering
    Wu, Zhenxing
    Zhang, Odin
    Wang, Xiaorui
    Fu, Li
    Zhao, Huifeng
    Wang, Jike
    Du, Hongyan
    Jiang, Dejun
    Deng, Yafeng
    Cao, Dongsheng
    Hsieh, Chang-Yu
    Hou, Tingjun
    NATURE MACHINE INTELLIGENCE, 2024, 6 (11) : 1359 - 1369