Multi-modal humor segment prediction in video

被引:0
|
作者
Zekun Yang
Yuta Nakashima
Haruo Takemura
机构
[1] Nagoya University,Information Technology Center
[2] Osaka University,Institute for Datability Science
[3] Osaka University,Cyber Media Center
来源
Multimedia Systems | 2023年 / 29卷
关键词
Humor prediction; Vision and language; Multi-modal;
D O I
暂无
中图分类号
学科分类号
摘要
Humor can be induced by various signals in the visual, linguistic, and vocal modalities emitted by humans. Finding humor in videos is an interesting but challenging task for an intelligent system. Previous methods predict humor in the sentence level given some text (e.g., speech transcript), sometimes together with other modalities, such as videos and speech. Such methods ignore humor caused by the visual modality in their design, since their prediction is made for a sentence. In this work, we first give new annotations to humor based on a sitcom by setting up temporal segments of ground truth humor derived from the laughter track. Then, we propose a method to find these temporal segments of humor. We adopt an approach based on sliding window, where the visual modality is described by pose and facial features along with the linguistic modality given as subtitles in each sliding window. We use long short-term memory networks to encode the temporal dependency in poses and facial features and pre-trained BERT to handle subtitles. Experimental results show that our method improves the performance of humor prediction.
引用
收藏
页码:2389 / 2398
页数:9
相关论文
共 50 条
  • [1] Multi-modal humor segment prediction in video
    Yang, Zekun
    Nakashima, Yuta
    Takemura, Haruo
    MULTIMEDIA SYSTEMS, 2023, 29 (04) : 2389 - 2398
  • [2] Adaptive Multi-Modal Ensemble Network for Video Memorability Prediction
    Li, Jing
    Guo, Xin
    Yue, Fumei
    Xue, Fanfu
    Sun, Jiande
    APPLIED SCIENCES-BASEL, 2022, 12 (17):
  • [3] Multi-modal Video Summarization
    Huang, Jia-Hong
    ICMR 2024 - Proceedings of the 2024 International Conference on Multimedia Retrieval, 2024, : 1214 - 1218
  • [4] Multi-modal Video Summarization
    Huang, Jia-Hong
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 1214 - 1218
  • [5] HMNet: a hierarchical multi-modal network for educational video concept prediction
    Huang, Wei
    Xiao, Tong
    Liu, Qi
    Huang, Zhenya
    Ma, Jianhui
    Chen, Enhong
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (09) : 2913 - 2924
  • [6] HMNet: a hierarchical multi-modal network for educational video concept prediction
    Wei Huang
    Tong Xiao
    Qi Liu
    Zhenya Huang
    Jianhui Ma
    Enhong Chen
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 2913 - 2924
  • [7] Multi-modal fusion for video understanding
    Hoogs, A
    Mundy, J
    Cross, G
    30TH APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP, PROCEEDINGS: ANALYSIS AND UNDERSTANDING OF TIME VARYING IMAGERY, 2001, : 103 - 108
  • [8] Multi-modal Dense Video Captioning
    Iashin, Vladimir
    Rahtu, Esa
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, : 4117 - 4126
  • [9] Automated Multi-Modal Video Editing for Ads Video
    Lin, Qin
    Pang, Nuo
    Hong, Zhiying
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4823 - 4827
  • [10] Multi-Modal Audio, Video and Physiological Sensor Learning for Continuous Emotion Prediction
    Brady, Kevin
    Gwon, Youngjune
    Khorrami, Pooya
    Godoy, Elizabeth
    Campbell, William
    Dagli, Charlie
    Huang, Thomas S.
    PROCEEDINGS OF THE 6TH INTERNATIONAL WORKSHOP ON AUDIO/VISUAL EMOTION CHALLENGE (AVEC'16), 2016, : 97 - 104