AFFECTIVE VIDEO CONTENT ANALYSES BY USING CROSS-MODAL EMBEDDING LEARNING FEATURES

被引：0

作者：

Li, Benchao ^{[1
,4
]}

Chen, Zhenzhong ^{[2
,4
]}

Li, Shan ^{[4
]}

Zheng, Wei-Shi ^{[3
,5
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China

[2] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Hubei, Peoples R China

[3] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Guangdong, Peoples R China

[4] Tencent, Palo Alto, CA 94306 USA

[5] Minist Educ, Key Lab Machine Intelligence & Adv Comp, Beijing, Peoples R China

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME) | 2019年

关键词：

Affective Video Content Analyses; Cross-modal Embedding; Learning Features;

D O I：

10.1109/ICME.2019.00150

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Most existing methods on affective video content analyses are dedicated to single media, either visual content or audio content and few attempts for combined analysis of the two media signals are made. In this paper, we employ a cross-modal embedding learning approach to learn the compact feature representations of different modalities that are discriminative for analyzing the emotion attributes of the video. Specifically, we introduce inter-modal similarity constraints and intra-modal similarity constraints to promote the joint embedding learning procedure for obtaining the robust features. In order to capture cues in different grains, global and local features are extracted from both visual and audio signals, thereafter a unified framework consisting with global and local features embedding networks is built for affective video content analyses. Experiments show that our proposed approach significantly outperforms the state-of-the-art methods and demonstrate the effectiveness of our approach.

引用

页码：844 / 849

页数：6

共 50 条

[31] The Cross-Modal and Cross-Cultural Processing of Affective Information
Esposito, Anna
Riviello, Maria Teresa
NEURAL NETS WIRN10, 2011, 226 : 301 - 310
[32] Adversarial Multi-Grained Embedding Network for Cross-Modal Text-Video Retrieval
Han, Ning
Chen, Jingjing
Zhang, Hao
Wang, Huanwen
Chen, Hao
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (02)
[33] Gated Multi-modal Fusion with Cross-modal Contrastive Learning for Video Question Answering
Lyu, Chenyang
Li, Wenxi
Ji, Tianbo
Zhou, Liting
Gurrin, Cathal
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 427 - 438
[34] Label Embedding Online Hashing for Cross-Modal Retrieval
Wang, Yongxin
Luo, Xin
Xu, Xin-Shun
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 871 - 879
[35] Cross-modal Pretraining and Matching for Video Understanding
Wang, Limin
MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 1 - 1
[36] Cross-Modal and Hierarchical Modeling of Video and Text
Zhang, Bowen
Hu, Hexiang
Sha, Fei
COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 : 385 - 401
[37] CRoss-MODAL Communications For Holographic Video Streaming
Gao, Yun
Wang, Tong
Zhou, Liang
Zhuang, Weihua
IEEE WIRELESS COMMUNICATIONS, 2025, 32 (02) : 96 - 102
[38] Masking Modalities for Cross-modal Video Retrieval
Gabeur, Valentin
Nagrani, Arsha
Sun, Chen
Alahari, Karteek
Schmid, Cordelia
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2111 - 2120
[39] Cross-modal Embeddings for Video and Audio Retrieval
Suris, Didac
Duarte, Amanda
Salvador, Amaia
Torres, Jordi
Giro-i-Nieto, Xavier
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV, 2019, 11132 : 711 - 716
[40] Cross-modal distribution alignment embedding network for generalized zero-shot learning
Li, Qin
Hou, Mingzhen
Lai, Hong
Yang, Ming
NEURAL NETWORKS, 2022, 148 : 176 - 182

← 1 2 3 4 5 →