AFFECTIVE VIDEO CONTENT ANALYSES BY USING CROSS-MODAL EMBEDDING LEARNING FEATURES

被引:0
|
作者
Li, Benchao [1 ,4 ]
Chen, Zhenzhong [2 ,4 ]
Li, Shan [4 ]
Zheng, Wei-Shi [3 ,5 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China
[2] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Hubei, Peoples R China
[3] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Guangdong, Peoples R China
[4] Tencent, Palo Alto, CA 94306 USA
[5] Minist Educ, Key Lab Machine Intelligence & Adv Comp, Beijing, Peoples R China
关键词
Affective Video Content Analyses; Cross-modal Embedding; Learning Features;
D O I
10.1109/ICME.2019.00150
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Most existing methods on affective video content analyses are dedicated to single media, either visual content or audio content and few attempts for combined analysis of the two media signals are made. In this paper, we employ a cross-modal embedding learning approach to learn the compact feature representations of different modalities that are discriminative for analyzing the emotion attributes of the video. Specifically, we introduce inter-modal similarity constraints and intra-modal similarity constraints to promote the joint embedding learning procedure for obtaining the robust features. In order to capture cues in different grains, global and local features are extracted from both visual and audio signals, thereafter a unified framework consisting with global and local features embedding networks is built for affective video content analyses. Experiments show that our proposed approach significantly outperforms the state-of-the-art methods and demonstrate the effectiveness of our approach.
引用
收藏
页码:844 / 849
页数:6
相关论文
共 50 条
  • [1] Learning Cross-Modal Contrastive Features for Video Domain Adaptation
    Kim, Donghyun
    Tsai, Yi-Hsuan
    Zhuang, Bingbing
    Yu, Xiang
    Sclaroff, Stan
    Saenko, Kate
    Chandraker, Manmohan
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13598 - 13607
  • [2] Cross-modal Metric Learning with Graph Embedding
    Zhang, Youcai
    Gu, Xiaodong
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018, : 758 - 764
  • [3] Knowledge graph embedding by fusing multimodal content via cross-modal learning
    Liu, Shi
    Li, Kaiyang
    Wang, Yaoying
    Zhu, Tianyou
    Li, Jiwei
    Chen, Zhenyu
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (08) : 14180 - 14200
  • [4] Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval
    Mithun, Niluthpol Chowdhury
    Li, Juncheng
    Metze, Florian
    Roy-Chowdhury, Amit K.
    ICMR '18: PROCEEDINGS OF THE 2018 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2018, : 19 - 27
  • [5] Neighbourhood Structure Preserving Cross-Modal Embedding for Video Hyperlinking
    Hao, Yanbin
    Ngo, Chong-Wah
    Huet, Benoit
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (01) : 188 - 200
  • [6] Learning Cross-Modal Aligned Representation With Graph Embedding
    Zhang, Youcai
    Cao, Jiayan
    Gu, Xiaodong
    IEEE ACCESS, 2018, 6 : 77321 - 77333
  • [7] Graph Embedding Learning for Cross-Modal Information Retrieval
    Zhang, Youcai
    Gu, Xiaodong
    NEURAL INFORMATION PROCESSING (ICONIP 2017), PT III, 2017, 10636 : 594 - 601
  • [8] Rich Features Embedding for Cross-Modal Retrieval: A Simple Baseline
    Fu, Xin
    Zhao, Yao
    Wei, Yunchao
    Zhao, Yufeng
    Wei, Shikui
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (09) : 2354 - 2365
  • [9] CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
    Zolfaghari, Mohammadreza
    Zhu, Yi
    Gehler, Peter
    Brox, Thomas
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 1430 - 1439
  • [10] Semantic-enhanced discriminative embedding learning for cross-modal retrieval
    Hao Pan
    Jun Huang
    International Journal of Multimedia Information Retrieval, 2022, 11 : 369 - 382