EENet: embedding enhancement network for compositional image-text retrieval using generated text

被引:0
|
作者
Hur, Chan [1 ]
Park, Hyeyoung [1 ]
机构
[1] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu, South Korea
关键词
Compositional Image-Text Retrieval; Image-Captioning; Joint embedding; Visual Feature Enhancement; Textual Feature Generation;
D O I
10.1007/s11042-023-17531-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we consider the compositional image-text retrieval task, which searches for appropriate target images given a reference image with feedback text as a query. For instance, when a user finds a dress on an E-commerce site that meets all their needs except for the length and decoration, the user can give sentence-form feedback, e.g., "I like this dress, but I wish it was a little shorter and had no ribbon," to the system. This is a practical scenario for advanced retrieval systems and is applicable to user interactive search systems or E-commerce systems. To tackle this task, we propose a model, the Embedding Enhancement Network (EENet), which includes a text generation module and an image feature enhancement module using the generated text. While the conventional works mainly focus on developing an efficient composition module of a given image and text query, EENet actively generates an additional textual description to enhance the image feature vector in the embedding space, which is inspired by the human ability to recognize an object using a visual sensor and prior textual information. Also, a new training loss is introduced to ensure that images and additional generated texts are well combined. The experimental results show that the EENet achieves considerable improvement on retrieval performance evaluations; for the Recall@1 metric, it improved by 3.4% in Fashion200k and 1.4% in MIT-States over the baseline model.
引用
收藏
页码:49689 / 49705
页数:17
相关论文
共 50 条
  • [31] Cross-modal Graph Matching Network for Image-text Retrieval
    Cheng, Yuhao
    Zhu, Xiaoguang
    Qian, Jiuchao
    Wen, Fei
    Liu, Peilin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (04)
  • [32] HAAN: Learning a Hierarchical Adaptive Alignment Network for Image-Text Retrieval
    Wang, Shuhuai
    Liu, Zheng
    Pei, Xinlei
    Xu, Junhao
    SENSORS, 2023, 23 (05)
  • [33] Semantic Completion and Filtration for Image-Text Retrieval
    Yang, Song
    Li, Qiang
    Li, Wenhui
    Li, Xuan-Ya
    Jin, Ran
    Lv, Bo
    Wang, Rui
    Liu, Anan
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)
  • [34] Cross-modal independent matching network for image-text retrieval
    Ke, Xiao
    Chen, Baitao
    Yang, Xiong
    Cai, Yuhang
    Liu, Hao
    Guo, Wenzhong
    PATTERN RECOGNITION, 2025, 159
  • [35] Global Relation-Aware Attention Network for Image-Text Retrieval
    Cao, Jie
    Qian, Shengsheng
    Zhang, Huaiwen
    Fang, Quan
    Xu, Changsheng
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 19 - 28
  • [36] Learning Multi-view Embedding in Joint Space for Bidirectional Image-Text Retrieval
    Ran, Lu
    Wang, Wenmin
    2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2017,
  • [37] Context-aware relation enhancement and similarity reasoning for image-text retrieval
    Cui, Zheng
    Hu, Yongli
    Sun, Yanfeng
    Yin, Baocai
    IET COMPUTER VISION, 2024, 18 (05) : 652 - 665
  • [38] Multi-view and region reasoning semantic enhancement for image-text retrieval
    Cheng, Wengang
    Han, Ziyi
    He, Di
    Wu, Lifang
    MULTIMEDIA SYSTEMS, 2024, 30 (04)
  • [39] MKVSE: Multimodal Knowledge Enhanced Visual-semantic Embedding for Image-text Retrieval
    Feng, Duoduo
    He, Xiangteng
    Peng, Yuxin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (05)
  • [40] Dual-Level Representation Enhancement on Characteristic and Context for Image-Text Retrieval
    Yang, Song
    Li, Qiang
    Li, Wenhui
    Li, Xuanya
    Liu, An-An
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (11) : 8037 - 8050