Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning

被引:0
|
作者
Huang, Zhao [1 ,2 ]
Hu, Haowu [2 ]
Su, Miao [2 ]
机构
[1] Minist Educ, Key Lab Modern Teaching Technol, Xian 710062, Peoples R China
[2] Shaanxi Normal Univ, Sch Comp Sci, Xian 710119, Peoples R China
基金
中国国家自然科学基金;
关键词
dual attention network; data augmentation; cross-modal retrieval; enhanced relation network; CANONICAL CORRELATION-ANALYSIS; NETWORK;
D O I
10.3390/e25081216
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Information retrieval across multiple modes has attracted much attention from academics and practitioners. One key challenge of cross-modal retrieval is to eliminate the heterogeneous gap between different patterns. Most of the existing methods tend to jointly construct a common subspace. However, very little attention has been given to the study of the importance of different fine-grained regions of various modalities. This lack of consideration significantly influences the utilization of the extracted information of multiple modalities. Therefore, this study proposes a novel text-image cross-modal retrieval approach that constructs a dual attention network and an enhanced relation network (DAER). More specifically, the dual attention network tends to precisely extract fine-grained weight information from text and images, while the enhanced relation network is used to expand the differences between different categories of data in order to improve the computational accuracy of similarity. The comprehensive experimental results on three widely-used major datasets (i.e., Wikipedia, Pascal Sentence, and XMediaNet) show that our proposed approach is effective and superior to existing cross-modal retrieval methods.
引用
收藏
页数:18
相关论文
共 50 条
  • [41] Cross-Modal Hashing Retrieval Based on Deep Residual Network
    Li, Zhiyi
    Xu, Xiaomian
    Zhang, Du
    Zhang, Peng
    COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2021, 36 (02): : 383 - 405
  • [42] Cross-modal retrieval based on deep regularized hashing constraints
    Khan, Asad
    Hayat, Sakander
    Ahmad, Muhammad
    Wen, Jinyu
    Farooq, Muhammad Umar
    Fang, Meie
    Jiang, Wenchao
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2022, 37 (09) : 6508 - 6530
  • [43] Cross-Modal Discrete Representation Learning
    Liu, Alexander H.
    Jin, SouYoung
    Lai, Cheng-I Jeff
    Rouditchenko, Andrew
    Oliva, Aude
    Glass, James
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 3013 - 3035
  • [44] Two-stage deep learning for supervised cross-modal retrieval
    Jie Shao
    Zhicheng Zhao
    Fei Su
    Multimedia Tools and Applications, 2019, 78 : 16615 - 16631
  • [45] Deep Cross-Modal Correlation Learning for Audio and Lyrics in Music Retrieval
    Yu, Yi
    Tang, Suhua
    Raposo, Francisco
    Chen, Lei
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2019, 15 (01)
  • [46] Two-stage deep learning for supervised cross-modal retrieval
    Shao, Jie
    Zhao, Zhicheng
    Su, Fei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (12) : 16615 - 16631
  • [47] Weakly-paired deep dictionary learning for cross-modal retrieval
    Liu, Huaping
    Wang, Feng
    Zhang, Xinyu
    Sun, Fuchun
    PATTERN RECOGNITION LETTERS, 2020, 130 (130) : 199 - 206
  • [48] MULTI-LEVEL CONTRASTIVE LEARNING FOR HYBRID CROSS-MODAL RETRIEVAL
    Zhao, Yiming
    Lu, Haoyu
    Zhao, Shiqi
    Wu, Haoran
    Lu, Zhiwu
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 6390 - 6394
  • [49] Deep Semantic Mapping for Cross-Modal Retrieval
    Wang, Cheng
    Yang, Haojin
    Meinel, Christoph
    2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015), 2015, : 234 - 241
  • [50] Deep Relation Embedding for Cross-Modal Retrieval
    Zhang, Yifan
    Zhou, Wengang
    Wang, Min
    Tian, Qi
    Li, Houqiang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 617 - 627