Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features

被引：4

作者：

Baldrati, Alberto ^{[1
,2
]}

Bertini, Marco ^{[1
]}

Uricchio, Tiberio ^{[3
]}

Del Bimbo, Alberto ^{[1
]}

机构：

[1] Univ Firenze, Viale Morgagni 65, I-50124 Florence, Italy

[2] Univ Pisa, Largo Bruno Pontecorvo 3, I-56127 Pisa, Italy

[3] Univ Macerata, Via Garibaldi 20, I-62100 Macerata, Italy

来源：

ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS | 2024年 / 20卷 / 03期

基金：

欧盟地平线“2020”;

关键词：

Multimodal retrieval; combiner networks; vision language model;

D O I：

10.1145/3617597

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Given a query composed of a reference image and a relative caption, the Composed Image Retrieval goal is to retrieve images visually similar to the reference one that integrates the modifications expressed by the caption. Given that recent research has demonstrated the efficacy of large-scale vision and language pre-trained (VLP) models in various tasks, we rely on features from the OpenAI CLIP model to tackle the considered task. We initially perform a task-oriented fine-tuning of both CLIP encoders using the element-wise sum of visual and textual features. Then, in the second stage, we train a Combiner network that learns to combine the image-text features integrating the bimodal information and providing combined features used to perform the retrieval. We use contrastive learning in both stages of training. Starting from the bare CLIP features as a baseline, experimental results show that the task-oriented fine-tuning and the carefully crafted Combiner network are highly effective and outperform more complex state-of-the-art approaches on FashionIQ and CIRR, two popular and challenging datasets for composed image retrieval. Code and pre-trained models are available at https://github.com/ABaldrati/CLIP4Cir.

引用

页数：24

共 50 条

[21] External knowledge document retrieval strategy based on intention-guided and meta-learning for task-oriented dialogues
Xie, Hongtu
Chen, Jiaxing
Lin, Yiquan
Zhang, Lin
Wang, Guoqian
Xie, Kai
ADVANCED ENGINEERING INFORMATICS, 2023, 56
[22] Task-Oriented Synthetic-to-Real Image Translation for Data-Efficient Learning
Bernal, Edgar A.
Sharma, Rohan
Yenneti, Shanmukha
Mackey, Ian
Malave, Javier
Walvoord, Derek J.
Brower, Bernard
SYNTHETIC DATA FOR ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING: TOOLS, TECHNIQUES, AND APPLICATIONS II, 2024, 13035
[23] Image retrieval system based on machine learning and using color features
Demsar, J
Radolovic, D
Solina, F
COMPUTER ANALYSIS OF IMAGES AND PATTERNS, 1999, 1689 : 480 - 488
[24] A Task-oriented Service Personalization Scheme for Smart Environments Using Reinforcement Learning
Tegelund, Bjorn
Son, Heesuk
Lee, Dongman
2016 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATION WORKSHOPS (PERCOM WORKSHOPS), 2016,
[25] Using Reinforcement Learning for Dialogue Act Classification in Task-oriented Conversation Systems
Xia, Qingyang
2018 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (CSSE 2018), 2018, : 187 - 196
[26] Web-Based Seamless Migration for Task-Oriented Mobile Distance Learning
Zhang, Degan
Li, Yuan-chao
Zhang, Huaiyu
Zhang, Xinshang
Zeng, Guangping
INTERNATIONAL JOURNAL OF DISTANCE EDUCATION TECHNOLOGIES, 2006, 4 (03) : 62 - 76
[27] A Cross-modal image retrieval method based on contrastive learning
Zhou, Wen
JOURNAL OF OPTICS-INDIA, 2024, 53 (03): : 2098 - 2107
[28] Task-oriented multi-robot learning in behavior-based systems
Parker, LE
IROS 96 - PROCEEDINGS OF THE 1996 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS - ROBOTIC INTELLIGENCE INTERACTING WITH DYNAMIC WORLDS, VOLS 1-3, 1996, : 1478 - 1487
[29] Unraveling and Mitigating Endogenous Task-oriented Spurious Correlations in Ego-graphs via Automated Counterfactual Contrastive Learning
Lin, Tianqianjin
Kang, Yangyang
Jiang, Zhuoren
Song, Kaisong
Kuang, Kun
Sun, Changlong
Huang, Cui
Liu, Xiaozhong
EXPERT SYSTEMS WITH APPLICATIONS, 2025, 266
[30] Federated Learning-Based Cooperative Model Training for Task-Oriented Semantic Communication
Sun, Haofeng
Tian, Hui
Ni, Wanli
Zheng, Jingheng
IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS, INFOCOM WKSHPS 2024, 2024,

← 1 2 3 4 5 →