Cross-Modal Correlation Learning with Deep Convolutional Architecture

被引:0
|
作者
Hua, Yan [1 ]
Tian, Hu [2 ]
Cai, Anni [3 ]
Shi, Ping [1 ]
机构
[1] Commun Univ China, Beijing, Peoples R China
[2] Fujitsu Res & Dev Ctr, Beijing, Peoples R China
[3] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
关键词
Deep architecture; Convolution; Correlation learning; Large margin; Cross-modal retrieval;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the explosive growth of online multi-media data, methodologies of retrieving documents from heterogeneous modalities are indispensable to facilitate information acquisition in real applications. Most of existing research efforts are focused on building correlation learning models on hand-crafted features for visual and textual modalities. However, they lack the ability to capture the meaningful patterns from complicated visual modality, and are not able to identify the true correlation between modalities during feature learning process. In this paper, we propose a novel cross-modal correlation learning method with well-designed deep convolutional network to learn representations from visual modality. A cross-modal correlation layer with a linear projection is added on the top of the network by maximizing semantic consistency with large margin principle. All the parameters are jointly optimized with stochastic gradient descent. With the deep architecture, our model is able to disentangle the complex visual information, and learn the semantically consistent patterns in a layer-by-layer fashion. Experimental results on widely used NUS-WIDE dataset show that our model outperforms state-of-the-art correlation learning methods built on 6 hand-crafted visual features for image-text retrieval.
引用
收藏
页数:4
相关论文
共 50 条
  • [21] Multimedia Feature Mapping and Correlation Learning for Cross-Modal Retrieval
    Yuan, Xu
    Zhong, Hua
    Chen, Zhikui
    Zhong, Fangming
    Hu, Yueming
    INTERNATIONAL JOURNAL OF GRID AND HIGH PERFORMANCE COMPUTING, 2018, 10 (03) : 29 - 45
  • [22] Show and Tell in the Loop: Cross-Modal Circular Correlation Learning
    Peng, Yuxin
    Qi, Jinwei
    IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (06) : 1538 - 1550
  • [23] Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation
    Hua, Yan
    Wang, Shuhui
    Liu, Siyuan
    Cai, Anni
    Huang, Qingming
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (06) : 1201 - 1216
  • [24] HCMSL: Hybrid Cross-modal Similarity Learning for Cross-modal Retrieval
    Zhang, Chengyuan
    Song, Jiayu
    Zhu, Xiaofeng
    Zhu, Lei
    Zhang, Shichao
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (01)
  • [25] CLASSIFICATION OF BREAST LESIONS USING CROSS-MODAL DEEP LEARNING
    Hadad, Omer
    Bakalo, Ran
    Ben-Ari, Rami
    Hashoul, Sharbell
    Amit, Guy
    2017 IEEE 14TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2017), 2017, : 109 - 112
  • [26] Learning Cross-Modal Deep Representations for Robust Pedestrian Detection
    Xu, Dan
    Ouyang, Wanli
    Ricci, Elisa
    Wang, Xiaogang
    Sebe, Nicu
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4236 - 4244
  • [27] Deep Evidential Learning with Noisy Correspondence for Cross-modal Retrieval
    Qin, Yang
    Peng, Dezhong
    Peng, Xi
    Wang, Xu
    Hu, Peng
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4948 - 4956
  • [28] Cross-Modal Retrieval using Random Multimodal Deep Learning
    Somasekar, Hemanth
    Naveen, Kavya
    JOURNAL OF MECHANICS OF CONTINUA AND MATHEMATICAL SCIENCES, 2019, 14 (02): : 185 - 200
  • [29] DRSL: Deep Relational Similarity Learning for Cross-modal Retrieval
    Wang, Xu
    Hu, Peng
    Zhen, Liangli
    Peng, Dezhong
    INFORMATION SCIENCES, 2021, 546 : 298 - 311
  • [30] Deep Cross-Modal Hashing With Ranking Learning for Noisy Labels
    Shu, Zhenqiu
    Bai, Yibing
    Yong, Kailing
    Yu, Zhengtao
    IEEE TRANSACTIONS ON BIG DATA, 2025, 11 (02) : 553 - 565