Multi-modal humor segment prediction in video

被引:0
|
作者
Zekun Yang
Yuta Nakashima
Haruo Takemura
机构
[1] Nagoya University,Information Technology Center
[2] Osaka University,Institute for Datability Science
[3] Osaka University,Cyber Media Center
来源
Multimedia Systems | 2023年 / 29卷
关键词
Humor prediction; Vision and language; Multi-modal;
D O I
暂无
中图分类号
学科分类号
摘要
Humor can be induced by various signals in the visual, linguistic, and vocal modalities emitted by humans. Finding humor in videos is an interesting but challenging task for an intelligent system. Previous methods predict humor in the sentence level given some text (e.g., speech transcript), sometimes together with other modalities, such as videos and speech. Such methods ignore humor caused by the visual modality in their design, since their prediction is made for a sentence. In this work, we first give new annotations to humor based on a sitcom by setting up temporal segments of ground truth humor derived from the laughter track. Then, we propose a method to find these temporal segments of humor. We adopt an approach based on sliding window, where the visual modality is described by pose and facial features along with the linguistic modality given as subtitles in each sliding window. We use long short-term memory networks to encode the temporal dependency in poses and facial features and pre-trained BERT to handle subtitles. Experimental results show that our method improves the performance of humor prediction.
引用
收藏
页码:2389 / 2398
页数:9
相关论文
共 50 条
  • [31] Multi-modal Language Models for Lecture Video Retrieval
    Chen, Huizhong
    Cooper, Matthew
    Joshi, Dhiraj
    Girod, Bernd
    PROCEEDINGS OF THE 2014 ACM CONFERENCE ON MULTIMEDIA (MM'14), 2014, : 1081 - 1084
  • [32] Towards Developing a Multi-Modal Video Recommendation System
    Pingali, Sriram
    Mondal, Prabir
    Chakder, Daipayan
    Saha, Sriparna
    Ghosh, Angshuman
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [33] Multi-modal tag localization for mobile video search
    Rui Zhang
    Sheng Tang
    Wu Liu
    Yongdong Zhang
    Jintao Li
    Multimedia Systems, 2017, 23 : 713 - 724
  • [34] Personalized Multi-modal Video Retrieval on Mobile Devices
    Zhang, Haotian
    Jepson, Allan D.
    Mohomed, Iqbal
    Derpanis, Konstantinos G.
    Zhang, Ran
    Fazly, Afsaneh
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 1185 - 1191
  • [35] Multi-modal Interactive Video Retrieval with Temporal Queries
    Heller, Silvan
    Arnold, Rahel
    Gasser, Ralph
    Gsteiger, Viktor
    Parian-Scherb, Mahnaz
    Rossetto, Luca
    Sauter, Loris
    Spiess, Florian
    Schuldt, Heiko
    MULTIMEDIA MODELING, MMM 2022, PT II, 2022, 13142 : 493 - 498
  • [36] A Multi-Modal Approach to Story Segmentation for News Video
    Lekha Chaisorn
    Tat-Seng Chua
    Chin-Hui Lee
    World Wide Web, 2003, 6 : 187 - 208
  • [37] A Solution to Multi-modal Ads Video Tagging Challenge
    Wu, Hao
    Wang, Jiajie
    Gu, Yuanzhe
    Zhao, Peisen
    Zu, Zhonglin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4808 - 4812
  • [38] A multi-modal system for the retrieval of semantic video events
    Amir, A
    Basu, S
    Iyengar, G
    Lin, CY
    Naphade, M
    Smith, JR
    Srinivasan, S
    Tseng, B
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2004, 96 (02) : 216 - 236
  • [39] Video Pivoting Unsupervised Multi-Modal Machine Translation
    Li, Mingjie
    Huang, Po-Yao
    Chang, Xiaojun
    Hu, Junjie
    Yang, Yi
    Hauptmann, Alex
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3918 - 3932
  • [40] Multi-Modal Emotion Recognition Fusing Video and Audio
    Xu, Chao
    Du, Pufeng
    Feng, Zhiyong
    Meng, Zhaopeng
    Cao, Tianyi
    Dong, Caichao
    APPLIED MATHEMATICS & INFORMATION SCIENCES, 2013, 7 (02): : 455 - 462