Hybrid DAER Based Cross-Modal Retrieval Exploiting Deep Representation Learning

被引:0
|
作者
Huang, Zhao [1 ,2 ]
Hu, Haowu [2 ]
Su, Miao [2 ]
机构
[1] Minist Educ, Key Lab Modern Teaching Technol, Xian 710062, Peoples R China
[2] Shaanxi Normal Univ, Sch Comp Sci, Xian 710119, Peoples R China
基金
中国国家自然科学基金;
关键词
dual attention network; data augmentation; cross-modal retrieval; enhanced relation network; CANONICAL CORRELATION-ANALYSIS; NETWORK;
D O I
10.3390/e25081216
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
Information retrieval across multiple modes has attracted much attention from academics and practitioners. One key challenge of cross-modal retrieval is to eliminate the heterogeneous gap between different patterns. Most of the existing methods tend to jointly construct a common subspace. However, very little attention has been given to the study of the importance of different fine-grained regions of various modalities. This lack of consideration significantly influences the utilization of the extracted information of multiple modalities. Therefore, this study proposes a novel text-image cross-modal retrieval approach that constructs a dual attention network and an enhanced relation network (DAER). More specifically, the dual attention network tends to precisely extract fine-grained weight information from text and images, while the enhanced relation network is used to expand the differences between different categories of data in order to improve the computational accuracy of similarity. The comprehensive experimental results on three widely-used major datasets (i.e., Wikipedia, Pascal Sentence, and XMediaNet) show that our proposed approach is effective and superior to existing cross-modal retrieval methods.
引用
收藏
页数:18
相关论文
共 50 条
  • [31] Cross-Modal Guided Visual Representation Learning for Social Image Retrieval
    Guan, Ziyu
    Zhao, Wanqing
    Liu, Hongmin
    Nakashima, Yuta
    Noboru, Babaguchi
    He, Xiaofei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (03) : 2186 - 2198
  • [32] Adaptive Adversarial Learning based cross-modal retrieval
    Li, Zhuoyi
    Lu, Huibin
    Fu, Hao
    Wang, Zhongrui
    Gu, Guanghun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [33] Semantic supervised learning based Cross-Modal Retrieval
    Li, Zhuoyi
    Fu, Hao
    Gu, Guanghua
    PROCEEDINGS OF THE ACM TURING AWARD CELEBRATION CONFERENCE-CHINA 2024, ACM-TURC 2024, 2024, : 207 - 209
  • [34] Cross-modal Common Representation Learning by Hybrid Transfer Network
    Huang, Xin
    Peng, Yuxin
    Yuan, Mingkuan
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1893 - 1900
  • [35] Continual learning in cross-modal retrieval
    Wang, Kai
    Herranz, Luis
    van de Weijer, Joost
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2021, 2021, : 3623 - 3633
  • [36] Learning DALTS for cross-modal retrieval
    Yu, Zheng
    Wang, Wenmin
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2019, 4 (01) : 9 - 16
  • [37] Sequential Learning for Cross-modal Retrieval
    Song, Ge
    Tan, Xiaoyang
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 4531 - 4539
  • [38] DCEL: Deep Cross-modal Evidential Learning for Text-Based Person Retrieval
    Li, Shenshen
    Xu, Xing
    Yang, Yang
    Shen, Fumin
    Mo, Yijun
    Li, Yujie
    Shen, Heng Tao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 6292 - 6300
  • [39] TOWARDS SKETCH-BASED IMAGE RETRIEVAL WITH DEEP CROSS-MODAL CORRELATION LEARNING
    Huang, Fei
    Jin, Cheng
    Zhang, Yuejie
    Zhang, Tao
    2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2017, : 907 - 912
  • [40] Improvement of deep cross-modal retrieval by generating real-valued representation
    Bhatt, Nikita
    Ganatra, Amit
    PEERJ COMPUTER SCIENCE, 2021, 7 : 1 - 18