CCL: Cross-modal Correlation Learning With Multigrained Fusion by Hierarchical Network

被引:186
|
作者
Peng, Yuxin [1 ]
Qi, Jinwei [1 ]
Huang, Xin [1 ]
Yuan, Yuxin [1 ]
机构
[1] Peking Univ, Inst Comp Sci & Technol, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
Cross-modal retrieval; fine-grained correlation; joint optimization; multi-task learning; REPRESENTATION; MODEL;
D O I
10.1109/TMM.2017.2742704
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cross-modal retrieval has become a highlighted research topic for retrieval across multimedia data such as image and text. A two-stage learning framework is widely adopted by most existing methods based on deep neural network (DNN): The first learning stage is to generate separate representation for each modality, and the second learning stage is to get the cross-modal common representation. However, the existing methods have three limitations: 1) In the first learning stage, they only model intramodality correlation, but ignore intermodality correlation with rich complementary context. 2) In the second learning stage, they only adopt shallow networks with single-loss regularization, but ignore the intrinsic relevance of intramodality and intermodality correlation. 3) Only original instances are considered while the complementary fine-grained clues provided by their patches are ignored. For addressing the above problems, this paper proposes a cross-modal correlation learning (CCL) approach with multigrained fusion by hierarchical network, and the contributions are as follows: 1) In the first learning stage, CCL exploits multilevel association with joint optimization to preserve the complementary context from intramodality and intermodality correlation simultaneously. 2) In the second learning stage, a multitask learning strategy is designed to adaptively balance the intramodality semantic category constraints and intermodality pairwise similarity constraints. 3) CCL adopts multigrained modeling, which fuses the coarse-grained instances and fine-grained patches to make cross-modal correlation more precise. Comparing with 13 state-of-the-art methods on 6 widely-used cross-modal datasets, the experimental results show our CCL approach achieves the best performance.
引用
收藏
页码:405 / 420
页数:16
相关论文
共 50 条
  • [21] Triplet Fusion Network Hashing for Unpaired Cross-Modal Retrieval
    Hu, Zhikai
    Liu, Xin
    Wang, Xingzhi
    Cheung, Yiu-ming
    Wang, Nannan
    Chen, Yewang
    ICMR'19: PROCEEDINGS OF THE 2019 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, 2019, : 141 - 149
  • [22] Cross-modal evidential fusion network for social media classification
    Yu, Chen
    Wang, Zhiguo
    COMPUTER SPEECH AND LANGUAGE, 2025, 92
  • [23] Bilateral Cross-Modal Fusion Network for Robot Grasp Detection
    Zhang, Qiang
    Sun, Xueying
    SENSORS, 2023, 23 (06)
  • [24] CMFFN: An efficient cross-modal feature fusion network for semantic
    Zhang, Yingjian
    Li, Ning
    Jiao, Jichao
    Ai, Jiawen
    Yan, Zheng
    Zeng, Yingchao
    Zhang, Tianxiang
    Li, Qian
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2025, 186
  • [25] Semantic Guidance Fusion Network for Cross-Modal Semantic Segmentation
    Zhang, Pan
    Chen, Ming
    Gao, Meng
    SENSORS, 2024, 24 (08)
  • [26] LEARNING A CROSS-MODAL HASHING NETWORK FOR MULTIMEDIA SEARCH
    Liong, Venice Erin
    Lu, Jiwen
    Tan, Yap-Peng
    2017 24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2017, : 3700 - 3704
  • [27] Cross-modal dual subspace learning with adversarial network
    Shang, Fei
    Zhang, Huaxiang
    Sun, Jiande
    Nie, Liqiang
    Liu, Li
    NEURAL NETWORKS, 2020, 126 : 132 - 142
  • [28] Supervised Hierarchical Cross-Modal Hashing
    Sun, Changchang
    Song, Xuemeng
    Feng, Fuli
    Zhao, Wayne Xin
    Zhang, Hao
    Nie, Liqiang
    PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, : 725 - 734
  • [29] Cross-Modal Correlation Learning by Adaptive Hierarchical Semantic Aggregation (vol 18, pg 1201, 2016)
    Hua, Yan
    Wang, Shuhui
    Liu, Siyuan
    Cai, Anni
    Huang, Qingming
    IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (10) : 2127 - 2127
  • [30] A General Cross-Modal Correlation Learning Method for Remote Sensing
    Lü Y.
    Xiong W.
    Zhang X.
    Wuhan Daxue Xuebao (Xinxi Kexue Ban)/Geomatics and Information Science of Wuhan University, 2022, 47 (11): : 1887 - 1895