Multi-modal humor segment prediction in video

被引:0
|
作者
Zekun Yang
Yuta Nakashima
Haruo Takemura
机构
[1] Nagoya University,Information Technology Center
[2] Osaka University,Institute for Datability Science
[3] Osaka University,Cyber Media Center
来源
Multimedia Systems | 2023年 / 29卷
关键词
Humor prediction; Vision and language; Multi-modal;
D O I
暂无
中图分类号
学科分类号
摘要
Humor can be induced by various signals in the visual, linguistic, and vocal modalities emitted by humans. Finding humor in videos is an interesting but challenging task for an intelligent system. Previous methods predict humor in the sentence level given some text (e.g., speech transcript), sometimes together with other modalities, such as videos and speech. Such methods ignore humor caused by the visual modality in their design, since their prediction is made for a sentence. In this work, we first give new annotations to humor based on a sitcom by setting up temporal segments of ground truth humor derived from the laughter track. Then, we propose a method to find these temporal segments of humor. We adopt an approach based on sliding window, where the visual modality is described by pose and facial features along with the linguistic modality given as subtitles in each sliding window. We use long short-term memory networks to encode the temporal dependency in poses and facial features and pre-trained BERT to handle subtitles. Experimental results show that our method improves the performance of humor prediction.
引用
收藏
页码:2389 / 2398
页数:9
相关论文
共 50 条
  • [21] Multi-modal Laughter Recognition in Video Conversations
    Escalera, Sergio
    Puertas, Eloi
    Radeva, Petia
    Pujol, Oriol
    2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 869 - 874
  • [22] Multi-modal tracking of faces for video communications
    Crowley, JL
    Berard, F
    1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, : 640 - 645
  • [23] The Multi-Modal Video Reasoning and Analyzing Competition
    Peng, Haoran
    Huang, He
    Xu, Li
    Li, Tianjiao
    Liu, Jun
    Rahmani, Hossein
    Ke, Qiuhong
    Guo, Zhicheng
    Wu, Cong
    Li, Rongchang
    Ye, Mang
    Wang, Jiahao
    Zhang, Jiaxu
    Liu, Yuanzhong
    He, Tao
    Zhang, Fuwei
    Liu, Xianbin
    Lin, Tao
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 806 - 813
  • [24] HUMOR: a HUman MOtion retrieval system with multi-modal queries
    Wu, MY
    Wu, YC
    Chiu, CY
    Chao, SP
    Yang, SN
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 315 - 316
  • [25] An approach to multi-modal multi-view video coding
    Zhang, Yun
    Jiang, Gangyi
    Yi, Wenjuan
    Yu, Mei
    Jiang, Zhidi
    Kim, Yong Deak
    2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1405 - +
  • [26] MULTI-MODAL PREDICTION OF PTSD AND STRESS INDICATORS
    Rozgic, Viktor
    Vazquez-Reina, Amelio
    Crystal, Michael
    Srivastava, Amit
    Tan, Veasna
    Berka, Chris
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [27] Multi-Modal Graph Learning for Disease Prediction
    Zheng, Shuai
    Zhu, Zhenfeng
    Liu, Zhizhe
    Guo, Zhenyu
    Liu, Yang
    Yang, Yuchen
    Zhao, Yao
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (09) : 2207 - 2216
  • [28] Multi-Modal Trajectory Prediction of NBA Players
    Hauri, Sandro
    Djuric, Nemanja
    Radosavljevic, Vladan
    Vucetic, Slobodan
    2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1639 - 1648
  • [29] Multi-modal graph learning for disease prediction
    Zheng, Shuai
    Zhu, Zhenfeng
    Liu, Zhizhe
    Guo, Zhenyu
    Liu, Yang
    Zhao, Yao
    arXiv, 2021,
  • [30] A multi-modal approach to story segmentation for news video
    Chaisorn, L
    Chua, TS
    Lee, CH
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2003, 6 (02): : 187 - 208