Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features

被引：4

作者：

Baldrati, Alberto ^{[1
,2
]}

Bertini, Marco ^{[1
]}

Uricchio, Tiberio ^{[3
]}

Del Bimbo, Alberto ^{[1
]}

机构：

[1] Univ Firenze, Viale Morgagni 65, I-50124 Florence, Italy

[2] Univ Pisa, Largo Bruno Pontecorvo 3, I-56127 Pisa, Italy

[3] Univ Macerata, Via Garibaldi 20, I-62100 Macerata, Italy

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2024年 / 20卷 / 03期

基金：

欧盟地平线“2020”;

关键词：

Multimodal retrieval; combiner networks; vision language model;

D O I：

10.1145/3617597

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Given a query composed of a reference image and a relative caption, the Composed Image Retrieval goal is to retrieve images visually similar to the reference one that integrates the modifications expressed by the caption. Given that recent research has demonstrated the efficacy of large-scale vision and language pre-trained (VLP) models in various tasks, we rely on features from the OpenAI CLIP model to tackle the considered task. We initially perform a task-oriented fine-tuning of both CLIP encoders using the element-wise sum of visual and textual features. Then, in the second stage, we train a Combiner network that learns to combine the image-text features integrating the bimodal information and providing combined features used to perform the retrieval. We use contrastive learning in both stages of training. Starting from the bare CLIP features as a baseline, experimental results show that the task-oriented fine-tuning and the carefully crafted Combiner network are highly effective and outperform more complex state-of-the-art approaches on FashionIQ and CIRR, two popular and challenging datasets for composed image retrieval. Code and pre-trained models are available at https://github.com/ABaldrati/CLIP4Cir.

引用

页数：24

共 50 条

[41] SDE-RAE:CLIP-based realistic image reconstruction and editing network using stochastic differential diffusion
Zhao, Honggang
Jin, Guozhu
Jiang, Xiaolong
Li, Mingyong
IMAGE AND VISION COMPUTING, 2023, 139
[42] Learning Based Combining Different Features for Medical Image Retrieval
Zhi Lijia
Zhang Shaomin
Zhao Dazhe
Yu Hongfei
Zhao Hong
Lin Shukuan
Zhao Dazhe
Zhao Hong
PROCEEDINGS OF THE FIFTH INTERNATIONAL CONFERENCE ON IMAGE AND GRAPHICS (ICIG 2009), 2009, : 969 - 972
[43] Personality-aware Natural Language Generation for Task-oriented Dialogue using Reinforcement Learning
Guo, Ao
Ohashi, Atsumoto
Chiba, Yuya
Tsunomori, Yuiko
Hirai, Ryu
Higashinaka, Ryuichiro
2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, 2023, : 1823 - 1828
[44] Image retrieval based on multiple features using wavelet
Tian, YM
Mei, LX
ICCIMA 2003: FIFTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND MULTIMEDIA APPLICATIONS, PROCEEDINGS, 2003, : 137 - 142
[45] Task-oriented Design of Concentric Tube Robots using Mechanics-based Models
Torres, Luis G.
Webster, Robert J., III
Alterovitz, Ron
2012 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2012, : 4449 - 4455
[46] Food image classification and image retrieval based on visual features and machine learning
Wei, Pengcheng
Wang, Bo
MULTIMEDIA SYSTEMS, 2022, 28 (06) : 2053 - 2064
[47] Food image classification and image retrieval based on visual features and machine learning
Pengcheng Wei
Bo Wang
Multimedia Systems, 2022, 28 : 2053 - 2064
[48] SUPERVISED CONTRASTIVE LEARNING-BASED DEEP HASH RETRIEVAL FOR REMOTE SENSING IMAGE
Huang, Mengluan
Dong, Le
Dong, Weisheng
Shi, Guangming
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1512 - 1515
[49] Task-Oriented Muscle Synergy Extraction Using An Autoencoder-Based Neural Model
Buongiorno, Domenico
Cascarano, Giacomo Donato
Camardella, Cristian
De Feudis, Irio
Frisoli, Antonio
Bevilacqua, Vitoantonio
INFORMATION, 2020, 11 (04)
[50] Content based image retrieval using image features information fusion
Ahmed, Khawaja Tehseen
Ummesafi, Shahida
Iqbal, Amjad
INFORMATION FUSION, 2019, 51 : 76 - 99

← 1 2 3 4 5 →