EENet: embedding enhancement network for compositional image-text retrieval using generated text

被引:0
|
作者
Hur, Chan [1 ]
Park, Hyeyoung [1 ]
机构
[1] Kyungpook Natl Univ, Sch Comp Sci & Engn, Daegu, South Korea
关键词
Compositional Image-Text Retrieval; Image-Captioning; Joint embedding; Visual Feature Enhancement; Textual Feature Generation;
D O I
10.1007/s11042-023-17531-y
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we consider the compositional image-text retrieval task, which searches for appropriate target images given a reference image with feedback text as a query. For instance, when a user finds a dress on an E-commerce site that meets all their needs except for the length and decoration, the user can give sentence-form feedback, e.g., "I like this dress, but I wish it was a little shorter and had no ribbon," to the system. This is a practical scenario for advanced retrieval systems and is applicable to user interactive search systems or E-commerce systems. To tackle this task, we propose a model, the Embedding Enhancement Network (EENet), which includes a text generation module and an image feature enhancement module using the generated text. While the conventional works mainly focus on developing an efficient composition module of a given image and text query, EENet actively generates an additional textual description to enhance the image feature vector in the embedding space, which is inspired by the human ability to recognize an object using a visual sensor and prior textual information. Also, a new training loss is introduced to ensure that images and additional generated texts are well combined. The experimental results show that the EENet achieves considerable improvement on retrieval performance evaluations; for the Recall@1 metric, it improved by 3.4% in Fashion200k and 1.4% in MIT-States over the baseline model.
引用
收藏
页码:49689 / 49705
页数:17
相关论文
共 50 条
  • [41] Learning hierarchical embedding space for image-text matching
    Sun, Hao
    Qin, Xiaolin
    Liu, Xiaojing
    INTELLIGENT DATA ANALYSIS, 2024, 28 (03) : 647 - 665
  • [42] CGNN: Caption-assisted graph neural network for image-text retrieval
    Hu, Yongli
    Zhang, Hanfu
    Jiang, Huajie
    Bi, Yandong
    Yin, Baocai
    PATTERN RECOGNITION LETTERS, 2022, 161 : 137 - 142
  • [43] Text-Guided Knowledge Transfer for Remote Sensing Image-Text Retrieval
    Liu, An-An
    Yang, Bo
    Li, Wenhui
    Song, Dan
    Sun, Zhengya
    Ren, Tongwei
    Wei, Zhiqiang
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [44] Flexible graph-based attention and pooling network for image-text retrieval
    Sun, Hao
    Qin, Xiaolin
    Liu, Xiaojing
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (19) : 57895 - 57912
  • [45] Heterogeneous Graph Fusion Network for cross-modal image-text retrieval
    Qin, Xueyang
    Li, Lishuang
    Pang, Guangyao
    Hao, Fei
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [46] IMAGE-TEXT ALIGNMENT AND RETRIEVAL USING LIGHT-WEIGHT TRANSFORMER
    Li, Wenrui
    Fan, Xiaopeng
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4758 - 4762
  • [47] Dynamic Modality Interaction Modeling for Image-Text Retrieval
    Qu, Leigang
    Liu, Meng
    Wu, Jianlong
    Gao, Zan
    Nie, Liqiang
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 1104 - 1113
  • [48] External Knowledge Dynamic Modeling for Image-text Retrieval
    Yang, Song
    Li, Qiang
    Li, Wenhui
    Liu, Min
    Li, Xuanya
    Liu, Anan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 5330 - 5338
  • [49] Asymmetric bi-encoder for image-text retrieval
    Xiong, Wei
    Liu, Haoliang
    Mi, Siya
    Zhang, Yu
    MULTIMEDIA SYSTEMS, 2023, 29 (06) : 3805 - 3818
  • [50] Multiview adaptive attention pooling for image-text retrieval
    Ding, Yunlai
    Yu, Jiaao
    Lv, Qingxuan
    Zhao, Haoran
    Dong, Junyu
    Li, Yuezun
    KNOWLEDGE-BASED SYSTEMS, 2024, 291