Cross-modal Representation Learning with Nonlinear Dimensionality Reduction

被引:0
|
作者
Kaya, Semih [1 ]
Vural, Elif [1 ]
机构
[1] Orta Dogu Tekn Univ, Elektr & Elekt Muhendisligi Bolumu, Ankara, Turkey
关键词
Cross-modal learning; multi-view learning; nonlinear projections;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In many problems in machine learning there exist relations between data collections from different modalities. The purpose of multi-modal learning algorithms is to efficiently use the information present in different modalities when solving multi-modal retrieval problems. In this work, a multi-modal representation learning algorithm is proposed, which is based on nonlinear dimensionality reduction. Compared to linear dimensionality reduction methods, nonlinear methods provide more flexible representations especially when there is high discrepancy between the structures of different modalities. In this work, we propose to align different modalities by mapping same-class training data from different modalities to nearby coordinates, while we also learn a Lipschitz-continuous interpolation function that generalizes the learnt representation to the whole data space. Experiments in image-text retrieval applications show that the proposed method yields high performance when compared to multi-modal learning methods in the literature.
引用
收藏
页数:4
相关论文
共 50 条
  • [31] Cross-Modal Graph Knowledge Representation and Distillation Learning for Land Cover Classification
    Wang, Wenzhen
    Liu, Fang
    Liao, Wenzhi
    Xiao, Liang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [32] XKD: Cross-Modal Knowledge Distillation with Domain Alignment for Video Representation Learning
    Sarkar, Pritam
    Etemad, Ali
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 13, 2024, : 14875 - 14885
  • [33] Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning
    Huang, Zhao
    Hu, Haowu
    Su, Miao
    ENTROPY, 2023, 25 (08)
  • [34] Unsupervised Cross-Modal Audio Representation Learning from Unstructured Multilingual Text
    Schindler, Alexander
    Gordea, Sergiu
    Knees, Peter
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 706 - 713
  • [35] Cross-Modal Representation Learning for Lightweight and Accurate Facial Action Unit Detection
    Chen, Yingjie
    Wu, Han
    Wang, Tao
    Wang, Yizhou
    Liang, Yun
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (04): : 7619 - 7626
  • [36] Adversarial Learning-Based Semantic Correlation Representation for Cross-Modal Retrieval
    Zhu, Lei
    Song, Jiayu
    Zhu, Xiaofeng
    Zhang, Chengyuan
    Zhang, Shichao
    Yuan, Xinpan
    IEEE MULTIMEDIA, 2020, 27 (04) : 79 - 90
  • [37] Probability Distribution Representation Learning for Image-Text Cross-Modal Retrieval
    Yang C.
    Liu L.
    Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 2022, 34 (05): : 751 - 759
  • [38] Learning Aligned Cross-Modal Representation for Generalized Zero-Shot Classification
    Fang, Zhiyu
    Zhu, Xiaobin
    Yang, Chun
    Han, Zheng
    Qin, Jingyan
    Yin, Xu-Cheng
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6605 - 6613
  • [39] Learning an Augmented RGB Representation with Cross-Modal Knowledge Distillation for Action Detection
    Dai, Rui
    Das, Srijan
    Bremond, Francois
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13033 - 13044
  • [40] Cross-Modal Learning with Adversarial Samples
    Li, Chao
    Deng, Cheng
    Gao, Shangqian
    Xie, De
    Liu, Wei
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32