Composed Image Retrieval using Contrastive Learning and Task-oriented CLIP-based Features

被引:4
|
作者
Baldrati, Alberto [1 ,2 ]
Bertini, Marco [1 ]
Uricchio, Tiberio [3 ]
Del Bimbo, Alberto [1 ]
机构
[1] Univ Firenze, Viale Morgagni 65, I-50124 Florence, Italy
[2] Univ Pisa, Largo Bruno Pontecorvo 3, I-56127 Pisa, Italy
[3] Univ Macerata, Via Garibaldi 20, I-62100 Macerata, Italy
基金
欧盟地平线“2020”;
关键词
Multimodal retrieval; combiner networks; vision language model;
D O I
10.1145/3617597
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Given a query composed of a reference image and a relative caption, the Composed Image Retrieval goal is to retrieve images visually similar to the reference one that integrates the modifications expressed by the caption. Given that recent research has demonstrated the efficacy of large-scale vision and language pre-trained (VLP) models in various tasks, we rely on features from the OpenAI CLIP model to tackle the considered task. We initially perform a task-oriented fine-tuning of both CLIP encoders using the element-wise sum of visual and textual features. Then, in the second stage, we train a Combiner network that learns to combine the image-text features integrating the bimodal information and providing combined features used to perform the retrieval. We use contrastive learning in both stages of training. Starting from the bare CLIP features as a baseline, experimental results show that the task-oriented fine-tuning and the carefully crafted Combiner network are highly effective and outperform more complex state-of-the-art approaches on FashionIQ and CIRR, two popular and challenging datasets for composed image retrieval. Code and pre-trained models are available at https://github.com/ABaldrati/CLIP4Cir.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Multimodal Depression Detection Using Task-oriented Transformer-based Embedding
    Rasipuram, Sowmya
    Bhat, Junaid Hamid
    Maitra, Anutosh
    Shaw, Bishal
    Saha, Sriparna
    2022 27TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2022), 2022,
  • [32] Deep Reinforcement Learning Based Task-Oriented Communication in Multi-Agent Systems
    He, Guojun
    Feng, Mingjie
    Zhang, Yu
    Liu, Guanghua
    Dai, Yueyue
    Jiang, Tao
    IEEE WIRELESS COMMUNICATIONS, 2023, 30 (03) : 112 - 119
  • [33] Reactive Task-oriented Redundancy Resolution using Constraint-Based Programming
    Wang, Yuquan
    Wang, Lihui
    2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), 2016, : 5689 - 5694
  • [34] Adaptive Task-Oriented Chatbots Using Feature-Based Knowledge Bases
    Campas, Carla
    Motger, Quim
    Franch, Xavier
    Marco, Jordi
    INTELLIGENT INFORMATION SYSTEMS, CAISE FORUM 2023, 2023, 477 : 95 - 102
  • [35] NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-Based Simulation
    Kim, Sungdong
    Chang, Minsuk
    Lee, Sang-Woo
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (ACL-IJCNLP 2021), VOL 1, 2021, : 3704 - 3717
  • [36] Task-oriented machine learning surrogates for tipping points of agent-based models
    Fabiani, Gianluca
    Evangelou, Nikolaos
    Cui, Tianqi
    Bello-Rivas, Juan M.
    Martin-Linares, Cristina P.
    Siettos, Constantinos
    Kevrekidis, Ioannis G.
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [37] Painless and accurate medical image analysis using deep reinforcement learning with task-oriented homogenized automatic pre-processing
    Yuan, Di
    Liu, Yunxin
    Xu, Zhenghua
    Zhan, Yuefu
    Chen, Junyang
    Lukasiewicz, Thomas
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 153
  • [38] ACO-Based Scheme in Edge Learning NOMA Networks for Task-Oriented Communications
    Garcia, Carla E.
    Camana, Mario R.
    Koo, Insoo
    IEEE ACCESS, 2024, 12 : 37692 - 37701
  • [39] Surface-based geometric modeling using task-oriented teaching trees
    Nakamura, A
    Ogasawara, T
    Tsukune, H
    Oshima, M
    IROS 96 - PROCEEDINGS OF THE 1996 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS - ROBOTIC INTELLIGENCE INTERACTING WITH DYNAMIC WORLDS, VOLS 1-3, 1996, : 1015 - 1022
  • [40] Contrastive Learning based Multi-task Network for Image Manipulation Detection
    Yin, Qilin
    Wang, Jinwei
    Lu, Wei
    Luo, Xiangyang
    SIGNAL PROCESSING, 2022, 201