Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach

被引:0
|
作者
Lee, Saehyung [1 ]
Yu, Sangwon [1 ]
Park, Junsung [1 ]
Yi, Jihun [1 ]
Yoon, Sungroh [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul, South Korea
[2] Seoul Natl Univ, Interdisciplinary Program Artificial Intelligence, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we primarily address the issue of dialogue-form context query within the interactive text-to-image retrieval task. Our methodology, PlugIR, actively utilizes the general instruction-following capability of LLMs in two ways. First, by reformulating the dialogue-form context, we eliminate the necessity of fine-tuning a retrieval model on existing visual dialogue data, thereby enabling the use of any arbitrary black-box model. Second, we construct the LLM questioner to generate non-redundant questions about the attributes of the target image, based on the information of retrieval candidate images in the current context. This approach mitigates the issues of noisiness and redundancy in the generated questions. Beyond our methodology, we propose a novel evaluation metric, Best log Rank Integral (BRI), for a comprehensive assessment of the interactive retrieval system. PlugIR demonstrates superior performance compared to both zero-shot and fine-tuned baselines in various benchmarks. Additionally, the two methodologies comprising PlugIR can be flexibly applied together or separately in various situations. Our codes are available at https://github.com/Saehyung-Lee/PlugIR.
引用
收藏
页码:791 / 809
页数:19
相关论文
共 50 条
  • [11] A PLUG-AND-PLAY DEEP IMAGE PRIOR
    Sun, Zhaodong
    Latorre, Fabian
    Sanchez, Thomas
    Cevher, Volkan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8103 - 8107
  • [12] A Plug-and-Play Method for Controlled Text Generation
    Pascual, Damian
    Egressy, Beni
    Meister, Clara
    Cotterell, Ryan
    Wattenhofer, Roger
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3973 - 3997
  • [13] Plug-and-Play Knowledge Injection for Pre-trained Language Models
    Zhang, Zhengyan
    Zeng, Zhiyuan
    Lin, Yankai
    Wang, Huadong
    Ye, Deming
    Xiao, Chaojun
    Han, Xu
    Liu, Zhiyuan
    Li, Peng
    Sun, Maosong
    Zhou, Jie
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023): LONG PAPERS, VOL 1, 2023, : 10641 - 10656
  • [14] Diffusion models as plug-and-play priors
    Graikos, Alexandros
    Malkin, Nikolay
    Jojic, Nebojsa
    Samaras, Dimitris
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [15] The Infinite Index: Information Retrieval on Generative Text-To-Image Models
    Deckers, Niklas
    Froebe, Maik
    Kiesel, Johannes
    Pandolfo, Gianluca
    Schroeder, Christopher
    Stein, Benno
    Potthast, Martin
    PROCEEDINGS OF THE 2023 CONFERENCE ON HUMAN INFORMATION INTERACTION AND RETRIEVAL, CHIIR 2023, 2023, : 172 - 186
  • [16] A plug-and-play approach for malaria vaccination
    Robert S. Oakes
    Christopher M. Jewell
    Nature Nanotechnology, 2018, 13 : 1096 - 1097
  • [17] A plug-and-play approach for malaria vaccination
    Oakes, Robert S.
    Jewell, Christopher M.
    NATURE NANOTECHNOLOGY, 2018, 13 (12) : 1096 - 1097
  • [18] Information literacy: A plug-and-play approach
    Andretta, S
    Cutting, A
    LIBRI, 2003, 53 (03): : 202 - 209
  • [19] An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models
    Chen, Liang
    Zhao, Haozhe
    Liu, Tianyu
    Bai, Shuai
    Lin, Junyang
    Zhou, Chang
    Chang, Baobao
    COMPUTER VISION - ECCV 2024, PT LXXXI, 2025, 15139 : 19 - 35
  • [20] Plug-and-play approach to class-adapted blind image deblurring
    Ljubenovic, Marina
    Figueiredo, Mario A. T.
    INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2019, 22 (02) : 79 - 97