Fine-Grained Correlation Learning with Stacked Co-attention Networks for Cross-Modal Information Retrieval

被引:0
|
作者
Lu, Yuhang [1 ,2 ]
Yu, Jing [1 ]
Liu, Yanbing [1 ]
Tan, Jianlong [1 ]
Guo, Li [1 ]
Zhang, Weifeng [3 ,4 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou, Peoples R China
[4] Zhejiang Future Technol Inst, Jiaxing, Peoples R China
关键词
Stacked co-attention network; Graph convolution; Fine-grained cross-modal correlation;
D O I
10.1007/978-3-319-99365-2_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-modal retrieval provides a flexible way to find semantically relevant information across different modalities given a query of one modality. The main challenge is to measure the similarity between different modalities of data. Generally, different modalities contain unequal amount of information when describing the same semantics. For example, textual descriptions often contain more background information that cannot be conveyed by images and vice versa. Existing works mostly map the global data features from different modalities to a common semantic space to measure their similarity, which ignore their imbalanced and complementary relationships. In this paper, we propose stacked co-attention networks (SCANet) to progressively learn the mutually attended features of different modalities and leverage these fine-grained correlations to enhance cross-modal retrieval performance. SCANet adopts a dual-path end-to-end framework to jointly learn the multimodal representations, stacked co-attention, and similarity metric. Experiment results on three widely-used benchmark datasets verify that SCANet outperforms state-of-the-art methods, with 19% improvements on MAP in average for the best case.
引用
收藏
页码:213 / 225
页数:13
相关论文
共 50 条
  • [21] Cross-modal recipe retrieval with stacked attention model
    Jing-Jing Chen
    Lei Pang
    Chong-Wah Ngo
    Multimedia Tools and Applications, 2018, 77 : 29457 - 29473
  • [22] Cross-modal recipe retrieval with stacked attention model
    Chen, Jing-Jing
    Pang, Lei
    Ngo, Chong-Wah
    MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (22) : 29457 - 29473
  • [23] Histopathology language-image representation learning for fine-grained digital pathology cross-modal retrieval
    Hu, Dingyi
    Jiang, Zhiguo
    Shi, Jun
    Xie, Fengying
    Wu, Kun
    Tang, Kunming
    Cao, Ming
    Huai, Jianguo
    Zheng, Yushan
    MEDICAL IMAGE ANALYSIS, 2024, 35
  • [24] Deep cross-modal hashing with fine-grained similarity
    Yangdong Chen
    Jiaqi Quan
    Yuejie Zhang
    Rui Feng
    Tao Zhang
    Applied Intelligence, 2023, 53 : 28954 - 28973
  • [25] Cross-Graph Attention Enhanced Multi-Modal Correlation Learning for Fine-Grained Image-Text Retrieval
    He, Yi
    Liu, Xin
    Cheung, Yiu-ming
    Peng, Shu-Juan
    Yi, Jinhan
    Fan, Wentao
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1865 - 1869
  • [26] Deep cross-modal hashing with fine-grained similarity
    Chen, Yangdong
    Quan, Jiaqi
    Zhang, Yuejie
    Feng, Rui
    Zhang, Tao
    APPLIED INTELLIGENCE, 2023, 53 (23) : 28954 - 28973
  • [27] Exploring a Fine-Grained Multiscale Method for Cross-Modal Remote Sensing Image Retrieval
    Yuan, Zhiqiang
    Zhang, Wenkai
    Fu, Kun
    Li, Xuan
    Deng, Chubo
    Wang, Hongqi
    Sun, Xian
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [28] Fine-Grained Visual Textual Alignment for Cross-Modal Retrieval Using Transformer Encoders
    Messina, Nicola
    Amato, Giuseppe
    Esuli, Andrea
    Falchi, Fabrizio
    Gennaro, Claudio
    Marchand-Maillet, Stephane
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2021, 17 (04)
  • [29] Fine-Grained Matching with Multi-Perspective Similarity Modeling for Cross-Modal Retrieval
    Xie, Xiumin
    Hou, Chuanwen
    Li, Zhixin
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, VOL 180, 2022, 180 : 2148 - 2158
  • [30] Paired Cross-Modal Data Augmentation for Fine-Grained Image-to-Text Retrieval
    Wang, Hao
    Lin, Guosheng
    Hoi, Steven
    Miao, Chunyan
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5517 - 5526