Cross-modal Representation Learning with Nonlinear Dimensionality Reduction

被引:0
|
作者
Kaya, Semih [1 ]
Vural, Elif [1 ]
机构
[1] Orta Dogu Tekn Univ, Elektr & Elekt Muhendisligi Bolumu, Ankara, Turkey
关键词
Cross-modal learning; multi-view learning; nonlinear projections;
D O I
暂无
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
In many problems in machine learning there exist relations between data collections from different modalities. The purpose of multi-modal learning algorithms is to efficiently use the information present in different modalities when solving multi-modal retrieval problems. In this work, a multi-modal representation learning algorithm is proposed, which is based on nonlinear dimensionality reduction. Compared to linear dimensionality reduction methods, nonlinear methods provide more flexible representations especially when there is high discrepancy between the structures of different modalities. In this work, we propose to align different modalities by mapping same-class training data from different modalities to nearby coordinates, while we also learn a Lipschitz-continuous interpolation function that generalizes the learnt representation to the whole data space. Experiments in image-text retrieval applications show that the proposed method yields high performance when compared to multi-modal learning methods in the literature.
引用
收藏
页数:4
相关论文
共 50 条
  • [41] Auditory and cross-modal implicit learning
    Green, CD
    Groff, P
    INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1996, 31 (3-4) : 15442 - 15442
  • [42] Continual learning in cross-modal retrieval
    Wang, Kai
    Herranz, Luis
    van de Weijer, Joost
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3623 - 3633
  • [43] Learning DALTS for cross-modal retrieval
    Yu, Zheng
    Wang, Wenmin
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 9 - 16
  • [44] Sequential Learning for Cross-modal Retrieval
    Song, Ge
    Tan, Xiaoyang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4531 - 4539
  • [45] Cross-modal representation of human caretakers in squirrel monkeys
    Adachi, Ikuma
    Fujita, Kazuo
    BEHAVIOURAL PROCESSES, 2007, 74 (01) : 27 - 32
  • [46] Cross-modal hashing retrieval with compatible triplet representation
    Hao, Zhifeng
    Jin, Yaochu
    Yan, Xueming
    Wang, Chuyue
    Yang, Shangshang
    Ge, Hong
    NEUROCOMPUTING, 2024, 602
  • [47] Representation separation adversarial networks for cross-modal retrieval
    Deng, Jiaxin
    Ou, Weihua
    Gou, Jianping
    Song, Heping
    Wang, Anzhi
    Xu, Xing
    WIRELESS NETWORKS, 2024, 30 (05) : 3469 - 3481
  • [48] Towards Bridged Vision and Language: Learning Cross-Modal Knowledge Representation for Relation Extraction
    Feng, Junhao
    Wang, Guohua
    Zheng, Changmeng
    Cai, Yi
    Fu, Ze
    Wang, Yaowei
    Wei, Xiao-Yong
    Li, Qing
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 561 - 575
  • [49] Deep Cross-Modal Representation Learning and Distillation for Illumination-Invariant Pedestrian Detection
    Liu, Tianshan
    Lam, Kin-Man
    Zhao, Rui
    Qiu, Guoping
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 315 - 329
  • [50] Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation
    Wei, Kun
    Li, Bei
    Lv, Hang
    Lu, Quan
    Jiang, Ning
    Xie, Lei
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 (2432-2444) : 2432 - 2444