Multi-modal humor segment prediction in video

被引：0

作者：

Zekun Yang

Yuta Nakashima

Haruo Takemura

机构：

[1] Nagoya University,Information Technology Center

[2] Osaka University,Institute for Datability Science

[3] Osaka University,Cyber Media Center

来源：

Multimedia Systems | 2023年 / 29卷

关键词：

Humor prediction; Vision and language; Multi-modal;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Humor can be induced by various signals in the visual, linguistic, and vocal modalities emitted by humans. Finding humor in videos is an interesting but challenging task for an intelligent system. Previous methods predict humor in the sentence level given some text (e.g., speech transcript), sometimes together with other modalities, such as videos and speech. Such methods ignore humor caused by the visual modality in their design, since their prediction is made for a sentence. In this work, we first give new annotations to humor based on a sitcom by setting up temporal segments of ground truth humor derived from the laughter track. Then, we propose a method to find these temporal segments of humor. We adopt an approach based on sliding window, where the visual modality is described by pose and facial features along with the linguistic modality given as subtitles in each sliding window. We use long short-term memory networks to encode the temporal dependency in poses and facial features and pre-trained BERT to handle subtitles. Experimental results show that our method improves the performance of humor prediction.

引用

页码：2389 / 2398

页数：9

共 50 条

[21] Multi-modal Laughter Recognition in Video Conversations
Escalera, Sergio
Puertas, Eloi
Radeva, Petia
Pujol, Oriol
2009 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPR WORKSHOPS 2009), VOLS 1 AND 2, 2009, : 869 - 874
[22] Multi-modal tracking of faces for video communications
Crowley, JL
Berard, F
1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, : 640 - 645
[23] The Multi-Modal Video Reasoning and Analyzing Competition
Peng, Haoran
Huang, He
Xu, Li
Li, Tianjiao
Liu, Jun
Rahmani, Hossein
Ke, Qiuhong
Guo, Zhicheng
Wu, Cong
Li, Rongchang
Ye, Mang
Wang, Jiahao
Zhang, Jiaxu
Liu, Yuanzhong
He, Tao
Zhang, Fuwei
Liu, Xianbin
Lin, Tao
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 806 - 813
[24] HUMOR: a HUman MOtion retrieval system with multi-modal queries
Wu, MY
Wu, YC
Chiu, CY
Chao, SP
Yang, SN
2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 315 - 316
[25] An approach to multi-modal multi-view video coding
Zhang, Yun
Jiang, Gangyi
Yi, Wenjuan
Yu, Mei
Jiang, Zhidi
Kim, Yong Deak
2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4, 2006, : 1405 - +
[26] MULTI-MODAL PREDICTION OF PTSD AND STRESS INDICATORS
Rozgic, Viktor
Vazquez-Reina, Amelio
Crystal, Michael
Srivastava, Amit
Tan, Veasna
Berka, Chris
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
[27] Multi-Modal Graph Learning for Disease Prediction
Zheng, Shuai
Zhu, Zhenfeng
Liu, Zhizhe
Guo, Zhenyu
Liu, Yang
Yang, Yuchen
Zhao, Yao
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2022, 41 (09) : 2207 - 2216
[28] Multi-Modal Trajectory Prediction of NBA Players
Hauri, Sandro
Djuric, Nemanja
Radosavljevic, Vladan
Vucetic, Slobodan
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2021), 2021, : 1639 - 1648
[29] Multi-modal graph learning for disease prediction
Zheng, Shuai
Zhu, Zhenfeng
Liu, Zhizhe
Guo, Zhenyu
Liu, Yang
Zhao, Yao
arXiv, 2021,
[30] A multi-modal approach to story segmentation for news video
Chaisorn, L
Chua, TS
Lee, CH
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2003, 6 (02): : 187 - 208

← 1 2 3 4 5 →