Cross-Modal Correlation Learning with Deep Convolutional Architecture

被引：0

作者：

Hua, Yan ^{[1
]}

Tian, Hu ^{[2
]}

Cai, Anni ^{[3
]}

Shi, Ping ^{[1
]}

机构：

[1] Commun Univ China, Beijing, Peoples R China

[2] Fujitsu Res & Dev Ctr, Beijing, Peoples R China

[3] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

来源：

2015 VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP) | 2015年

关键词：

Deep architecture; Convolution; Correlation learning; Large margin; Cross-modal retrieval;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

With the explosive growth of online multi-media data, methodologies of retrieving documents from heterogeneous modalities are indispensable to facilitate information acquisition in real applications. Most of existing research efforts are focused on building correlation learning models on hand-crafted features for visual and textual modalities. However, they lack the ability to capture the meaningful patterns from complicated visual modality, and are not able to identify the true correlation between modalities during feature learning process. In this paper, we propose a novel cross-modal correlation learning method with well-designed deep convolutional network to learn representations from visual modality. A cross-modal correlation layer with a linear projection is added on the top of the network by maximizing semantic consistency with large margin principle. All the parameters are jointly optimized with stochastic gradient descent. With the deep architecture, our model is able to disentangle the complex visual information, and learn the semantically consistent patterns in a layer-by-layer fashion. Experimental results on widely used NUS-WIDE dataset show that our model outperforms state-of-the-art correlation learning methods built on 6 hand-crafted visual features for image-text retrieval.

引用

页数：4

共 50 条

[41] Semantic deep cross-modal hashing
Lin, Qiubin
Cao, Wenming
He, Zhihai
He, Zhiquan
NEUROCOMPUTING, 2020, 396 (396) : 113 - 122
[42] Deep Supervised Cross-modal Retrieval
Zhen, Liangli
Hu, Peng
Wang, Xu
Peng, Dezhong
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10386 - 10395
[43] Cross-Modal Deep Variational Hashing
Liong, Venice Erin
Lu, Jiwen
Tan, Yap-Peng
Zhou, Jie
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4097 - 4105
[44] Deep Lifelong Cross-Modal Hashing
Xu, Liming
Li, Hanqi
Zheng, Bochuan
Li, Weisheng
Lv, Jiancheng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 13478 - 13493
[45] Asymmetric Deep Cross-modal Hashing
Gu, Jingzi
Zhang, JinChao
Lin, Zheng
Li, Bo
Wang, Weiping
Meng, Dan
COMPUTATIONAL SCIENCE - ICCS 2019, PT V, 2019, 11540 : 41 - 54
[46] CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network
Peng, Yuxin
Qi, Jinwei
Huang, Xin
Yuan, Yuxin
IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (02) : 405 - 420
[47] Cross-modal semantic correlation learning by Bi-CNN network
Wang, Chaoyi
Li, Liang
Yan, Chenggang
Wang, Zhan
Sun, Yaoqi
Zhang, Jiyong
IET IMAGE PROCESSING, 2021, 15 (14) : 3674 - 3684
[48] TINA: Cross-modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation
Hua, Yan
Wang, Shuhui
Liu, Siyuan
Huang, Qingming
Cai, Anni
2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 190 - 199
[49] Multi-Scale Correlation for Sequential Cross-modal Hashing Learning
Ye, Zhaoda
Peng, Yuxin
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 852 - 860
[50] CROSS-MODAL DEEP METRIC LEARNING WITH MULTI-TASK REGULARIZATION
Huang, Xin
Peng, Yuxin
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 943 - 948

← 1 2 3 4 5 →