Fine-Grained Correlation Learning with Stacked Co-attention Networks for Cross-Modal Information Retrieval

被引:0
|
作者
Lu, Yuhang [1 ,2 ]
Yu, Jing [1 ]
Liu, Yanbing [1 ]
Tan, Jianlong [1 ]
Guo, Li [1 ]
Zhang, Weifeng [3 ,4 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou, Peoples R China
[4] Zhejiang Future Technol Inst, Jiaxing, Peoples R China
关键词
Stacked co-attention network; Graph convolution; Fine-grained cross-modal correlation;
D O I
10.1007/978-3-319-99365-2_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal retrieval provides a flexible way to find semantically relevant information across different modalities given a query of one modality. The main challenge is to measure the similarity between different modalities of data. Generally, different modalities contain unequal amount of information when describing the same semantics. For example, textual descriptions often contain more background information that cannot be conveyed by images and vice versa. Existing works mostly map the global data features from different modalities to a common semantic space to measure their similarity, which ignore their imbalanced and complementary relationships. In this paper, we propose stacked co-attention networks (SCANet) to progressively learn the mutually attended features of different modalities and leverage these fine-grained correlations to enhance cross-modal retrieval performance. SCANet adopts a dual-path end-to-end framework to jointly learn the multimodal representations, stacked co-attention, and similarity metric. Experiment results on three widely-used benchmark datasets verify that SCANet outperforms state-of-the-art methods, with 19% improvements on MAP in average for the best case.
引用
收藏
页码:213 / 225
页数:13
相关论文
共 50 条
  • [1] Fine-Grained Label Learning via Siamese Network for Cross-modal Information Retrieval
    Xu, Yiming
    Yu, Jing
    Guo, Jingjing
    Hu, Yue
    Tan, Jianlong
    COMPUTATIONAL SCIENCE - ICCS 2019, PT II, 2019, 11537 : 304 - 317
  • [2] Fine-Grained Cross-Modal Contrast Learning for Video-Text Retrieval
    Liu, Hui
    Lv, Gang
    Gu, Yanhong
    Nian, Fudong
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT V, ICIC 2024, 2024, 14866 : 298 - 310
  • [3] Fine-Grained Cross-Modal Retrieval for Cultural Items with Focal Attention and Hierarchical Encodings
    Sheng, Shurong
    Laenen, Katrien
    Van Gool, Luc
    Moens, Marie-Francine
    COMPUTERS, 2021, 10 (09)
  • [4] A Cross-modal Attention Model for Fine-Grained Incident Retrieval from Dashcam Videos
    Pham, Dinh-Duy
    Dao, Minh-Son
    Nguyen, Thanh-Binh
    MULTIMEDIA MODELING, MMM 2023, PT I, 2023, 13833 : 409 - 420
  • [5] Deep cascaded cross-modal correlation learning for fine-grained sketch-based image retrieval
    Wang, Yanfei
    Huang, Fei
    Zhang, Yuejie
    Feng, Rui
    Zhang, Tao
    Fan, Weiguo
    PATTERN RECOGNITION, 2020, 100 (100)
  • [6] Cross-modal Contrastive Learning with Asymmetric Co-attention Network for Video Moment Retrieval
    Panta, Love
    Shrestha, Prashant
    Sapkota, Brabeem
    Bhattarai, Amrita
    Manandhar, Suresh
    Sah, Anand Kumar
    2024 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS, WACVW 2024, 2024, : 617 - 624
  • [7] Learning cross-modal correlations by exploring inter-word semantics and stacked co-attention
    Yu, Jing
    Lu, Yuhang
    Zhang, Weifeng
    Qin, Zengchang
    Liu, Yanbing
    Hu, Yue
    PATTERN RECOGNITION LETTERS, 2020, 130 (130) : 189 - 198
  • [8] Deep attentional fine-grained similarity network with adversarial learning for cross-modal retrieval
    Qingrong Cheng
    Xiaodong Gu
    Multimedia Tools and Applications, 2020, 79 : 31401 - 31428
  • [9] Cross-modal subspace learning for fine-grained sketch-based image retrieval
    Xu, Peng
    Yin, Qiyue
    Huang, Yongye
    Song, Yi-Zhe
    Ma, Zhanyu
    Wang, Liang
    Xiang, Tao
    Kleijn, W. Bastiaan
    Guo, Jun
    NEUROCOMPUTING, 2018, 278 : 75 - 86
  • [10] Deep attentional fine-grained similarity network with adversarial learning for cross-modal retrieval
    Cheng, Qingrong
    Gu, Xiaodong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (41-42) : 31401 - 31428