Cross-modal Representation Learning with Nonlinear Dimensionality Reduction

被引：0

作者：

Kaya, Semih ^{[1
]}

Vural, Elif ^{[1
]}

机构：

[1] Orta Dogu Tekn Univ, Elektr & Elekt Muhendisligi Bolumu, Ankara, Turkey

来源：

2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU) | 2019年

关键词：

Cross-modal learning; multi-view learning; nonlinear projections;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In many problems in machine learning there exist relations between data collections from different modalities. The purpose of multi-modal learning algorithms is to efficiently use the information present in different modalities when solving multi-modal retrieval problems. In this work, a multi-modal representation learning algorithm is proposed, which is based on nonlinear dimensionality reduction. Compared to linear dimensionality reduction methods, nonlinear methods provide more flexible representations especially when there is high discrepancy between the structures of different modalities. In this work, we propose to align different modalities by mapping same-class training data from different modalities to nearby coordinates, while we also learn a Lipschitz-continuous interpolation function that generalizes the learnt representation to the whole data space. Experiments in image-text retrieval applications show that the proposed method yields high performance when compared to multi-modal learning methods in the literature.

引用

页数：4

共 50 条

[41] Auditory and cross-modal implicit learning
Green, CD
Groff, P
INTERNATIONAL JOURNAL OF PSYCHOLOGY, 1996, 31 (3-4) : 15442 - 15442
[42] Continual learning in cross-modal retrieval
Wang, Kai
Herranz, Luis
van de Weijer, Joost
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3623 - 3633
[43] Learning DALTS for cross-modal retrieval
Yu, Zheng
Wang, Wenmin
CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 9 - 16
[44] Sequential Learning for Cross-modal Retrieval
Song, Ge
Tan, Xiaoyang
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4531 - 4539
[45] Cross-modal representation of human caretakers in squirrel monkeys
Adachi, Ikuma
Fujita, Kazuo
BEHAVIOURAL PROCESSES, 2007, 74 (01) : 27 - 32
[46] Cross-modal hashing retrieval with compatible triplet representation
Hao, Zhifeng
Jin, Yaochu
Yan, Xueming
Wang, Chuyue
Yang, Shangshang
Ge, Hong
NEUROCOMPUTING, 2024, 602
[47] Representation separation adversarial networks for cross-modal retrieval
Deng, Jiaxin
Ou, Weihua
Gou, Jianping
Song, Heping
Wang, Anzhi
Xu, Xing
WIRELESS NETWORKS, 2024, 30 (05) : 3469 - 3481
[48] Towards Bridged Vision and Language: Learning Cross-Modal Knowledge Representation for Relation Extraction
Feng, Junhao
Wang, Guohua
Zheng, Changmeng
Cai, Yi
Fu, Ze
Wang, Yaowei
Wei, Xiao-Yong
Li, Qing
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (01) : 561 - 575
[49] Deep Cross-Modal Representation Learning and Distillation for Illumination-Invariant Pedestrian Detection
Liu, Tianshan
Lam, Kin-Man
Zhao, Rui
Qiu, Guoping
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (01) : 315 - 329
[50] Conversational Speech Recognition by Learning Audio-Textual Cross-Modal Contextual Representation
Wei, Kun
Li, Bei
Lv, Hang
Lu, Quan
Jiang, Ning
Xie, Lei
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 (2432-2444) : 2432 - 2444

← 1 2 3 4 5 →