Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features

被引:4
|
作者
Baldrati, Alberto [1 ,2 ]
Bertini, Marco [1 ]
Uricchio, Tiberio [3 ]
Del Bimbo, Alberto [1 ]
机构
[1] Univ Firenze, Viale Morgagni 65, I-50124 Florence, Italy
[2] Univ Pisa, Largo Bruno Pontecorvo 3, I-56127 Pisa, Italy
[3] Univ Macerata, Via Garibaldi 20, I-62100 Macerata, Italy
基金
欧盟地平线“2020”;
关键词
Multimodal retrieval; combiner networks; vision language model;
D O I
10.1145/3617597
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given a query composed of a reference image and a relative caption, the Composed Image Retrieval goal is to retrieve images visually similar to the reference one that integrates the modifications expressed by the caption. Given that recent research has demonstrated the efficacy of large-scale vision and language pre-trained (VLP) models in various tasks, we rely on features from the OpenAI CLIP model to tackle the considered task. We initially perform a task-oriented fine-tuning of both CLIP encoders using the element-wise sum of visual and textual features. Then, in the second stage, we train a Combiner network that learns to combine the image-text features integrating the bimodal information and providing combined features used to perform the retrieval. We use contrastive learning in both stages of training. Starting from the bare CLIP features as a baseline, experimental results show that the task-oriented fine-tuning and the carefully crafted Combiner network are highly effective and outperform more complex state-of-the-art approaches on FashionIQ and CIRR, two popular and challenging datasets for composed image retrieval. Code and pre-trained models are available at https://github.com/ABaldrati/CLIP4Cir.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Effective conditioned and composed image retrieval combining CLIP-based features
    Baldrati, Alberto
    Bertini, Marco
    Uricchio, Tiberio
    Del Bimbo, Alberto
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 21434 - 21442
  • [2] Conditioned and composed image retrieval combining and partially fine-tuning CLIP-based features
    Baldrati, Alberto
    Bertini, Marco
    Uricchio, Tiberio
    Del Bimbo, Alberto
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, : 4955 - 4964
  • [3] CLIP-Based Composed Image Retrieval with Comprehensive Fusion and Data Augmentation
    Lin, Haoqiang
    Wen, Haokun
    Chen, Xiaolin
    Song, Xuemeng
    ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT I, 2024, 14471 : 190 - 202
  • [4] Task-oriented contrastive learning for unsupervised domain adaptation
    Wei, Xing
    Wen, Bin
    Yang, Fan
    Liu, Yujie
    Zhao, Chong
    Hu, Di
    Luo, Hui
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 229
  • [5] Task-Oriented Koopman-Based Control with Contrastive Encoder
    Lyu, Xubo
    Hu, Hanyang
    Siriya, Seth
    Pu, Ye
    Chen, Mo
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [6] CLIP-Based Grid Features and Masking for Remote Sensing Image Captioning
    Lin, Qiaoling
    Wang, Shuang
    Ye, Xiutiao
    Wang, Ruixuan
    Yang, Rui
    Jiao, Licheng
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 2631 - 2642
  • [7] A Task-oriented Chatbot Based on LSTM and Reinforcement Learning
    Hsueh, Yu-Ling
    Chou, Tai-Liang
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2023, 22 (01)
  • [8] A Task-oriented Chatbot Based on LSTM and Reinforcement Learning
    Chou, Tai-Liang
    Hsueh, Yu-Ling
    NLPIR 2019: 2019 3RD INTERNATIONAL CONFERENCE ON NATURAL LANGUAGE PROCESSING AND INFORMATION RETRIEVAL, 2019, : 87 - 91
  • [9] Task-oriented Dialogue System Based on Reinforcement Learning
    Song, Meina
    Chen, Zhongfu
    Niu, Peiqing
    Haihong, E.
    PROCEEDINGS OF 2019 IEEE 10TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING AND SERVICE SCIENCE (ICSESS 2019), 2019, : 93 - 98
  • [10] A Survey of Task-Oriented Dialogue Policies Based on Reinforcement Learning
    Xu K.
    Wang Z.-Y.
    Wang X.
    Qin H.
    Long Y.-X.
    Jisuanji Xuebao/Chinese Journal of Computers, 2024, 47 (06): : 1201 - 1231