AFFECTIVE VIDEO CONTENT ANALYSES BY USING CROSS-MODAL EMBEDDING LEARNING FEATURES

被引:0
|
作者
Li, Benchao [1 ,4 ]
Chen, Zhenzhong [2 ,4 ]
Li, Shan [4 ]
Zheng, Wei-Shi [3 ,5 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Informat Engn, Guangzhou, Guangdong, Peoples R China
[2] Wuhan Univ, Sch Remote Sensing & Informat Engn, Wuhan, Hubei, Peoples R China
[3] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Guangdong, Peoples R China
[4] Tencent, Palo Alto, CA 94306 USA
[5] Minist Educ, Key Lab Machine Intelligence & Adv Comp, Beijing, Peoples R China
关键词
Affective Video Content Analyses; Cross-modal Embedding; Learning Features;
D O I
10.1109/ICME.2019.00150
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Most existing methods on affective video content analyses are dedicated to single media, either visual content or audio content and few attempts for combined analysis of the two media signals are made. In this paper, we employ a cross-modal embedding learning approach to learn the compact feature representations of different modalities that are discriminative for analyzing the emotion attributes of the video. Specifically, we introduce inter-modal similarity constraints and intra-modal similarity constraints to promote the joint embedding learning procedure for obtaining the robust features. In order to capture cues in different grains, global and local features are extracted from both visual and audio signals, thereafter a unified framework consisting with global and local features embedding networks is built for affective video content analyses. Experiments show that our proposed approach significantly outperforms the state-of-the-art methods and demonstrate the effectiveness of our approach.
引用
收藏
页码:844 / 849
页数:6
相关论文
共 50 条
  • [41] Cross-Modal Retrieval via Similarity-Preserving Learning and Semantic Average Embedding
    Zhi, Tao
    Fan, Yingchun
    Han, Hong
    IEEE ACCESS, 2020, 8 : 223918 - 223930
  • [42] Learning Joint Embedding with Modality Alignments for Cross-Modal Retrieval of Recipes and Food Images
    Xie, Zhongwei
    Liu, Ling
    Li, Lin
    Zhong, Luo
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 2221 - 2230
  • [43] A multimodal embedding transfer approach for consistent and selective learning processes in cross-modal retrieval
    Zeng, Zhixiong
    He, Shuyi
    Zhang, Yuhao
    Mao, Wenji
    INFORMATION SCIENCES, 2025, 704
  • [44] Cross-modal Manifold Cutmix for Self-supervised Video Representation Learning
    Das, Srijan
    Ryoo, Michael
    2023 18TH INTERNATIONAL CONFERENCE ON MACHINE VISION AND APPLICATIONS, MVA, 2023,
  • [45] Self-Supervised Learning by Cross-Modal Audio-Video Clustering
    Alwassel, Humam
    Mahajan, Dhruv
    Korbar, Bruno
    Torresani, Lorenzo
    Ghanem, Bernard
    Tran, Du
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [46] XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning
    Sarkar, Pritam
    Etemad, Ali
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14875 - 14885
  • [47] Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval
    Jin, Weike
    Zhao, Zhou
    Zhang, Pengcheng
    Zhu, Jieming
    He, Xiuqiang
    Zhuang, Yueting
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1114 - 1124
  • [48] Cross-Modal Learning with Adversarial Samples
    Li, Chao
    Deng, Cheng
    Gao, Shangqian
    Xie, De
    Liu, Wei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [49] Auditory and cross-modal implicit learning
    Green, CD
    Groff, P
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1996, 31 (3-4) : 15442 - 15442
  • [50] Continual learning in cross-modal retrieval
    Wang, Kai
    Herranz, Luis
    van de Weijer, Joost
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3623 - 3633