Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features

被引:4
|
作者
Baldrati, Alberto [1 ,2 ]
Bertini, Marco [1 ]
Uricchio, Tiberio [3 ]
Del Bimbo, Alberto [1 ]
机构
[1] Univ Firenze, Viale Morgagni 65, I-50124 Florence, Italy
[2] Univ Pisa, Largo Bruno Pontecorvo 3, I-56127 Pisa, Italy
[3] Univ Macerata, Via Garibaldi 20, I-62100 Macerata, Italy
基金
欧盟地平线“2020”;
关键词
Multimodal retrieval; combiner networks; vision language model;
D O I
10.1145/3617597
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given a query composed of a reference image and a relative caption, the Composed Image Retrieval goal is to retrieve images visually similar to the reference one that integrates the modifications expressed by the caption. Given that recent research has demonstrated the efficacy of large-scale vision and language pre-trained (VLP) models in various tasks, we rely on features from the OpenAI CLIP model to tackle the considered task. We initially perform a task-oriented fine-tuning of both CLIP encoders using the element-wise sum of visual and textual features. Then, in the second stage, we train a Combiner network that learns to combine the image-text features integrating the bimodal information and providing combined features used to perform the retrieval. We use contrastive learning in both stages of training. Starting from the bare CLIP features as a baseline, experimental results show that the task-oriented fine-tuning and the carefully crafted Combiner network are highly effective and outperform more complex state-of-the-art approaches on FashionIQ and CIRR, two popular and challenging datasets for composed image retrieval. Code and pre-trained models are available at https://github.com/ABaldrati/CLIP4Cir.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] SDE-RAE:CLIP-based realistic image reconstruction and editing network using stochastic differential diffusion
    Zhao, Honggang
    Jin, Guozhu
    Jiang, Xiaolong
    Li, Mingyong
    IMAGE AND VISION COMPUTING, 2023, 139
  • [42] Learning Based Combining Different Features for Medical Image Retrieval
    Zhi Lijia
    Zhang Shaomin
    Zhao Dazhe
    Yu Hongfei
    Zhao Hong
    Lin Shukuan
    Zhao Dazhe
    Zhao Hong
    PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON IMAGE AND GRAPHICS (ICIG 2009), 2009, : 969 - 972
  • [43] Personality-aware Natural Language Generation for Task-oriented Dialogue using Reinforcement Learning
    Guo, Ao
    Ohashi, Atsumoto
    Chiba, Yuya
    Tsunomori, Yuiko
    Hirai, Ryu
    Higashinaka, Ryuichiro
    2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, : 1823 - 1828
  • [44] Image retrieval based on multiple features using wavelet
    Tian, YM
    Mei, LX
    ICCIMA 2003: FIFTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, PROCEEDINGS, 2003, : 137 - 142
  • [45] Task-oriented Design of Concentric Tube Robots using Mechanics-based Models
    Torres, Luis G.
    Webster, Robert J., III
    Alterovitz, Ron
    2012 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2012, : 4449 - 4455
  • [46] Food image classification and image retrieval based on visual features and machine learning
    Wei, Pengcheng
    Wang, Bo
    MULTIMEDIA SYSTEMS, 2022, 28 (06) : 2053 - 2064
  • [47] Food image classification and image retrieval based on visual features and machine learning
    Pengcheng Wei
    Bo Wang
    Multimedia Systems, 2022, 28 : 2053 - 2064
  • [48] SUPERVISED CONTRASTIVE LEARNING-BASED DEEP HASH RETRIEVAL FOR REMOTE SENSING IMAGE
    Huang, Mengluan
    Dong, Le
    Dong, Weisheng
    Shi, Guangming
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1512 - 1515
  • [49] Task-Oriented Muscle Synergy Extraction Using An Autoencoder-Based Neural Model
    Buongiorno, Domenico
    Cascarano, Giacomo Donato
    Camardella, Cristian
    De Feudis, Irio
    Frisoli, Antonio
    Bevilacqua, Vitoantonio
    INFORMATION, 2020, 11 (04)
  • [50] Content based image retrieval using image features information fusion
    Ahmed, Khawaja Tehseen
    Ummesafi, Shahida
    Iqbal, Amjad
    INFORMATION FUSION, 2019, 51 : 76 - 99