Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features

被引:4
|
作者
Baldrati, Alberto [1 ,2 ]
Bertini, Marco [1 ]
Uricchio, Tiberio [3 ]
Del Bimbo, Alberto [1 ]
机构
[1] Univ Firenze, Viale Morgagni 65, I-50124 Florence, Italy
[2] Univ Pisa, Largo Bruno Pontecorvo 3, I-56127 Pisa, Italy
[3] Univ Macerata, Via Garibaldi 20, I-62100 Macerata, Italy
基金
欧盟地平线“2020”;
关键词
Multimodal retrieval; combiner networks; vision language model;
D O I
10.1145/3617597
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given a query composed of a reference image and a relative caption, the Composed Image Retrieval goal is to retrieve images visually similar to the reference one that integrates the modifications expressed by the caption. Given that recent research has demonstrated the efficacy of large-scale vision and language pre-trained (VLP) models in various tasks, we rely on features from the OpenAI CLIP model to tackle the considered task. We initially perform a task-oriented fine-tuning of both CLIP encoders using the element-wise sum of visual and textual features. Then, in the second stage, we train a Combiner network that learns to combine the image-text features integrating the bimodal information and providing combined features used to perform the retrieval. We use contrastive learning in both stages of training. Starting from the bare CLIP features as a baseline, experimental results show that the task-oriented fine-tuning and the carefully crafted Combiner network are highly effective and outperform more complex state-of-the-art approaches on FashionIQ and CIRR, two popular and challenging datasets for composed image retrieval. Code and pre-trained models are available at https://github.com/ABaldrati/CLIP4Cir.
引用
收藏
页数:24
相关论文
共 50 条
  • [21] External knowledge document retrieval strategy based on intention-guided and meta-learning for task-oriented dialogues
    Xie, Hongtu
    Chen, Jiaxing
    Lin, Yiquan
    Zhang, Lin
    Wang, Guoqian
    Xie, Kai
    ADVANCED ENGINEERING INFORMATICS, 2023, 56
  • [22] Task-Oriented Synthetic-to-Real Image Translation for Data-Efficient Learning
    Bernal, Edgar A.
    Sharma, Rohan
    Yenneti, Shanmukha
    Mackey, Ian
    Malave, Javier
    Walvoord, Derek J.
    Brower, Bernard
    SYNTHETIC DATA FOR ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING: TOOLS, TECHNIQUES, AND APPLICATIONS II, 2024, 13035
  • [23] Image retrieval system based on machine learning and using color features
    Demsar, J
    Radolovic, D
    Solina, F
    COMPUTER ANALYSIS OF IMAGES AND PATTERNS, 1999, 1689 : 480 - 488
  • [24] A Task-oriented Service Personalization Scheme for Smart Environments Using Reinforcement Learning
    Tegelund, Bjorn
    Son, Heesuk
    Lee, Dongman
    2016 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATION WORKSHOPS (PERCOM WORKSHOPS), 2016,
  • [25] Using Reinforcement Learning for Dialogue Act Classification in Task-oriented Conversation Systems
    Xia, Qingyang
    2018 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND SOFTWARE ENGINEERING (CSSE 2018), 2018, : 187 - 196
  • [26] Web-Based Seamless Migration for Task-Oriented Mobile Distance Learning
    Zhang, Degan
    Li, Yuan-chao
    Zhang, Huaiyu
    Zhang, Xinshang
    Zeng, Guangping
    INTERNATIONAL JOURNAL OF DISTANCE EDUCATION TECHNOLOGIES, 2006, 4 (03) : 62 - 76
  • [27] A Cross-modal image retrieval method based on contrastive learning
    Zhou, Wen
    JOURNAL OF OPTICS-INDIA, 2024, 53 (03): : 2098 - 2107
  • [28] Task-oriented multi-robot learning in behavior-based systems
    Parker, LE
    IROS 96 - PROCEEDINGS OF THE 1996 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS - ROBOTIC INTELLIGENCE INTERACTING WITH DYNAMIC WORLDS, VOLS 1-3, 1996, : 1478 - 1487
  • [29] Unraveling and Mitigating Endogenous Task-oriented Spurious Correlations in Ego-graphs via Automated Counterfactual Contrastive Learning
    Lin, Tianqianjin
    Kang, Yangyang
    Jiang, Zhuoren
    Song, Kaisong
    Kuang, Kun
    Sun, Changlong
    Huang, Cui
    Liu, Xiaozhong
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 266
  • [30] Federated Learning-Based Cooperative Model Training for Task-Oriented Semantic Communication
    Sun, Haofeng
    Tian, Hui
    Ni, Wanli
    Zheng, Jingheng
    IEEE INFOCOM 2024-IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS, INFOCOM WKSHPS 2024, 2024,