AFFECTIVE VIDEO CONTENT ANALYSES BY USING CROSS-MODAL EMBEDDING LEARNING FEATURES

被引:0
|
作者
Li, Benchao [1 ,4 ]
Chen, Zhenzhong [2 ,4 ]
Li, Shan [4 ]
Zheng, Wei-Shi [3 ,5 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China
[2] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Hubei, Peoples R China
[3] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Guangdong, Peoples R China
[4] Tencent, Palo Alto, CA 94306 USA
[5] Minist Educ, Key Lab Machine Intelligence & Adv Comp, Beijing, Peoples R China
关键词
Affective Video Content Analyses; Cross-modal Embedding; Learning Features;
D O I
10.1109/ICME.2019.00150
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Most existing methods on affective video content analyses are dedicated to single media, either visual content or audio content and few attempts for combined analysis of the two media signals are made. In this paper, we employ a cross-modal embedding learning approach to learn the compact feature representations of different modalities that are discriminative for analyzing the emotion attributes of the video. Specifically, we introduce inter-modal similarity constraints and intra-modal similarity constraints to promote the joint embedding learning procedure for obtaining the robust features. In order to capture cues in different grains, global and local features are extracted from both visual and audio signals, thereafter a unified framework consisting with global and local features embedding networks is built for affective video content analyses. Experiments show that our proposed approach significantly outperforms the state-of-the-art methods and demonstrate the effectiveness of our approach.
引用
收藏
页码:844 / 849
页数:6
相关论文
共 50 条
  • [31] The Cross-Modal and Cross-Cultural Processing of Affective Information
    Esposito, Anna
    Riviello, Maria Teresa
    NEURAL NETS WIRN10, 2011, 226 : 301 - 310
  • [32] Adversarial Multi-Grained Embedding Network for Cross-Modal Text-Video Retrieval
    Han, Ning
    Chen, Jingjing
    Zhang, Hao
    Wang, Huanwen
    Chen, Hao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (02)
  • [33] Gated Multi-modal Fusion with Cross-modal Contrastive Learning for Video Question Answering
    Lyu, Chenyang
    Li, Wenxi
    Ji, Tianbo
    Zhou, Liting
    Gurrin, Cathal
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VII, 2023, 14260 : 427 - 438
  • [34] Label Embedding Online Hashing for Cross-Modal Retrieval
    Wang, Yongxin
    Luo, Xin
    Xu, Xin-Shun
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 871 - 879
  • [35] Cross-modal Pretraining and Matching for Video Understanding
    Wang, Limin
    MMPT '21: PROCEEDINGS OF THE 2021 WORKSHOP ON MULTI-MODAL PRE-TRAINING FOR MULTIMEDIA UNDERSTANDING, 2021, : 1 - 1
  • [36] Cross-Modal and Hierarchical Modeling of Video and Text
    Zhang, Bowen
    Hu, Hexiang
    Sha, Fei
    COMPUTER VISION - ECCV 2018, PT XIII, 2018, 11217 : 385 - 401
  • [37] CRoss-MODAL Communications For Holographic Video Streaming
    Gao, Yun
    Wang, Tong
    Zhou, Liang
    Zhuang, Weihua
    IEEE WIRELESS COMMUNICATIONS, 2025, 32 (02) : 96 - 102
  • [38] Masking Modalities for Cross-modal Video Retrieval
    Gabeur, Valentin
    Nagrani, Arsha
    Sun, Chen
    Alahari, Karteek
    Schmid, Cordelia
    2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022), 2022, : 2111 - 2120
  • [39] Cross-modal Embeddings for Video and Audio Retrieval
    Suris, Didac
    Duarte, Amanda
    Salvador, Amaia
    Torres, Jordi
    Giro-i-Nieto, Xavier
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV, 2019, 11132 : 711 - 716
  • [40] Cross-modal distribution alignment embedding network for generalized zero-shot learning
    Li, Qin
    Hou, Mingzhen
    Lai, Hong
    Yang, Ming
    NEURAL NETWORKS, 2022, 148 : 176 - 182