Multi-modal humor segment prediction in video

被引:0
|
作者
Zekun Yang
Yuta Nakashima
Haruo Takemura
机构
[1] Nagoya University,Information Technology Center
[2] Osaka University,Institute for Datability Science
[3] Osaka University,Cyber Media Center
来源
Multimedia Systems | 2023年 / 29卷
关键词
Humor prediction; Vision and language; Multi-modal;
D O I
暂无
中图分类号
学科分类号
摘要
Humor can be induced by various signals in the visual, linguistic, and vocal modalities emitted by humans. Finding humor in videos is an interesting but challenging task for an intelligent system. Previous methods predict humor in the sentence level given some text (e.g., speech transcript), sometimes together with other modalities, such as videos and speech. Such methods ignore humor caused by the visual modality in their design, since their prediction is made for a sentence. In this work, we first give new annotations to humor based on a sitcom by setting up temporal segments of ground truth humor derived from the laughter track. Then, we propose a method to find these temporal segments of humor. We adopt an approach based on sliding window, where the visual modality is described by pose and facial features along with the linguistic modality given as subtitles in each sliding window. We use long short-term memory networks to encode the temporal dependency in poses and facial features and pre-trained BERT to handle subtitles. Experimental results show that our method improves the performance of humor prediction.
引用
收藏
页码:2389 / 2398
页数:9
相关论文
共 50 条
  • [41] Overview of Tencent Multi-modal Ads Video Understanding
    Wang, Zhenzhi
    Li, Zhimin
    Wu, Liyu
    Xiong, Jiangfeng
    Lu, Qinglin
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 4725 - 4729
  • [42] VTLayout: A Multi-Modal Approach for Video Text Layout
    Zhao, Yuxuan
    Ma, Jin
    Qi, Zhongang
    Xie, Zehua
    Luo, Yu
    Kang, Qiusheng
    Shan, Ying
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2775 - 2784
  • [43] Multi-modal tag localization for mobile video search
    Zhang, Rui
    Tang, Sheng
    Liu, Wu
    Zhang, Yongdong
    Li, Jintao
    MULTIMEDIA SYSTEMS, 2017, 23 (06) : 713 - 724
  • [44] Hierarchical multi-modal video summarization with dynamic sampling
    Yu, Lingjian
    Zhao, Xing
    Xie, Liang
    Liang, Haoran
    Liang, Ronghua
    IET IMAGE PROCESSING, 2024, 18 (14) : 4577 - 4588
  • [45] On Pursuit of Designing Multi-modal Transformer for Video Grounding
    Cao, Meng
    Chen, Long
    Shou, Zheng
    Zhang, Can
    Zou, Yuexian
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 9810 - 9823
  • [46] Silhouette Coverage Analysis for Multi-modal Video Surveillance
    Verstockt, S.
    Poppe, C.
    De Potter, P.
    Hollemeersch, C.
    Van Hoecke, S.
    Lambert, P.
    Van de Walle, R.
    PIERS 2011 MARRAKESH: PROGRESS IN ELECTROMAGNETICS RESEARCH SYMPOSIUM, 2011, : 1279 - 1283
  • [47] Multi-modal People Detection from Aerial Video
    Flynn, Helen
    Cameron, Stephen
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON COMPUTER RECOGNITION SYSTEMS CORES 2013, 2013, 226 : 815 - 824
  • [48] Multi-Modal Query Expansion for Web Video Search
    Feng, Bailan
    Cao, Juan
    Chen, Zhineng
    Zhang, Yongdong
    Lin, Shouxun
    SIGIR 2010: PROCEEDINGS OF THE 33RD ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH DEVELOPMENT IN INFORMATION RETRIEVAL, 2010, : 721 - 722
  • [49] Multi-modal Video Dialog State Tracking in the Wild
    Abdessaied, Adnen
    Shi, Lei
    Bulling, Andreas
    COMPUTER VISION-ECCV 2024, PT LVII, 2025, 15115 : 348 - 365
  • [50] A comprehensive video dataset for multi-modal recognition systems
    Handa A.
    Agarwal R.
    Kohli N.
    Data Science Journal, 2019, 18 (01):