Cross-Modal Correlation Learning with Deep Convolutional Architecture

被引:0
|
作者
Hua, Yan [1 ]
Tian, Hu [2 ]
Cai, Anni [3 ]
Shi, Ping [1 ]
机构
[1] Commun Univ China, Beijing, Peoples R China
[2] Fujitsu Res & Dev Ctr, Beijing, Peoples R China
[3] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
来源
2015 VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP) | 2015年
关键词
Deep architecture; Convolution; Correlation learning; Large margin; Cross-modal retrieval;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the explosive growth of online multi-media data, methodologies of retrieving documents from heterogeneous modalities are indispensable to facilitate information acquisition in real applications. Most of existing research efforts are focused on building correlation learning models on hand-crafted features for visual and textual modalities. However, they lack the ability to capture the meaningful patterns from complicated visual modality, and are not able to identify the true correlation between modalities during feature learning process. In this paper, we propose a novel cross-modal correlation learning method with well-designed deep convolutional network to learn representations from visual modality. A cross-modal correlation layer with a linear projection is added on the top of the network by maximizing semantic consistency with large margin principle. All the parameters are jointly optimized with stochastic gradient descent. With the deep architecture, our model is able to disentangle the complex visual information, and learn the semantically consistent patterns in a layer-by-layer fashion. Experimental results on widely used NUS-WIDE dataset show that our model outperforms state-of-the-art correlation learning methods built on 6 hand-crafted visual features for image-text retrieval.
引用
收藏
页数:4
相关论文
共 50 条
  • [41] Semantic deep cross-modal hashing
    Lin, Qiubin
    Cao, Wenming
    He, Zhihai
    He, Zhiquan
    NEUROCOMPUTING, 2020, 396 (396) : 113 - 122
  • [42] Deep Supervised Cross-modal Retrieval
    Zhen, Liangli
    Hu, Peng
    Wang, Xu
    Peng, Dezhong
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10386 - 10395
  • [43] Cross-Modal Deep Variational Hashing
    Liong, Venice Erin
    Lu, Jiwen
    Tan, Yap-Peng
    Zhou, Jie
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 4097 - 4105
  • [44] Deep Lifelong Cross-Modal Hashing
    Xu, Liming
    Li, Hanqi
    Zheng, Bochuan
    Li, Weisheng
    Lv, Jiancheng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 13478 - 13493
  • [45] Asymmetric Deep Cross-modal Hashing
    Gu, Jingzi
    Zhang, JinChao
    Lin, Zheng
    Li, Bo
    Wang, Weiping
    Meng, Dan
    COMPUTATIONAL SCIENCE - ICCS 2019, PT V, 2019, 11540 : 41 - 54
  • [46] CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network
    Peng, Yuxin
    Qi, Jinwei
    Huang, Xin
    Yuan, Yuxin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (02) : 405 - 420
  • [47] Cross-modal semantic correlation learning by Bi-CNN network
    Wang, Chaoyi
    Li, Liang
    Yan, Chenggang
    Wang, Zhan
    Sun, Yaoqi
    Zhang, Jiyong
    IET IMAGE PROCESSING, 2021, 15 (14) : 3674 - 3684
  • [48] TINA: Cross-modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation
    Hua, Yan
    Wang, Shuhui
    Liu, Siyuan
    Huang, Qingming
    Cai, Anni
    2014 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2014, : 190 - 199
  • [49] Multi-Scale Correlation for Sequential Cross-modal Hashing Learning
    Ye, Zhaoda
    Peng, Yuxin
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 852 - 860
  • [50] CROSS-MODAL DEEP METRIC LEARNING WITH MULTI-TASK REGULARIZATION
    Huang, Xin
    Peng, Yuxin
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 943 - 948