Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models

被引:2
|
作者
Zhu, Hongyi [1 ]
Huang, Jia-Hong [1 ]
Rudinac, Stevan [1 ]
Kanoulas, Evangelos [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
关键词
Interactive Image Retrieval; Query Rewriting; Vision Language Models; Large Language Models; INFORMATION;
D O I
10.1145/3652583.3658032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image search stands as a pivotal task in multimedia and computer vision, finding applications across diverse domains, ranging from internet search to medical diagnostics. Conventional image search systems operate by accepting textual or visual queries, retrieving the top-relevant candidate results from the database. However, prevalent methods often rely on single-turn procedures, introducing potential inaccuracies and limited recall. These methods also face the challenges, such as vocabulary mismatch and the semantic gap, constraining their overall effectiveness. To address these issues, we propose an interactive image retrieval system capable of refining queries based on user relevance feedback in a multi-turn setting. This system incorporates a vision language model (VLM) based image captioner to enhance the quality of text-based queries, resulting in more informative queries with each iteration. Moreover, we introduce a large language model (LLM) based denoiser to refine text-based query expansions, mitigating inaccuracies in image descriptions generated by captioning models. To evaluate our system, we curate a new dataset by adapting the MSR-VTT video retrieval dataset to the image retrieval task, offering multiple relevant ground truth images for each query. Through comprehensive experiments, we validate the effectiveness of our proposed system against baseline methods, achieving state-of-the-art performance with a notable 10% improvement in terms of recall. Our contributions encompass the development of an innovative interactive image retrieval system, the integration of an LLM-based denoiser, the curation of a meticulously designed evaluation dataset, and thorough experimental validation.
引用
收藏
页码:978 / 987
页数:10
相关论文
共 50 条
  • [1] Query Rewriting for Retrieval-Augmented Large Language Models
    Ma, Xinbei
    Gong, Yeyun
    He, Pengcheng
    Zhao, Hai
    Duan, Nan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5303 - 5315
  • [2] A Framework for Enhancing Statute Law Retrieval Using Large Language Models
    Pham, Trang Ngoc Anh
    Do, Dinh-Truong
    Nguyen, Minh Le
    NEW FRONTIERS IN ARTIFICIAL INTELLIGENCE, JSAI-ISAI 2024, 2024, 14741 : 247 - 259
  • [3] An Interactive Multi-modal Query Answering System with Retrieval-Augmented Large Language Models
    Wang, Mengzhao
    Wu, Haotian
    Ke, Xiangyu
    Gao, Yunjun
    Xu, Xiaoliang
    Chen, Lu
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (12): : 4333 - 4336
  • [4] Prompting Is Programming: A Query Language for Large Language Models
    Beurer-Kellner, Luca
    Fischer, Marc
    Vechev, Martin
    PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2023, 7 (PLDI):
  • [5] Interactive Text-to-Image Retrieval with Large Language Models: A Plug-and-Play Approach
    Lee, Saehyung
    Yu, Sangwon
    Park, Junsung
    Yi, Jihun
    Yoon, Sungroh
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 791 - 809
  • [6] Using Large Language Models for Math Information Retrieval
    Mansouri, Behrooz
    Maarefdoust, Reihaneh
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2693 - 2697
  • [7] Attention Prompting on Image for Large Vision-Language Models
    Yu, Runpeng
    Yu, Weihao
    Wang, Xinchao
    COMPUTER VISION - ECCV 2024, PT XXX, 2025, 15088 : 251 - 268
  • [8] Boosting legal case retrieval by query content selection with large language models
    Zhou, Youchao
    Huang, Heyan
    Wu, Zhijing
    ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL IN THE ASIA PACIFIC REGION, SIGIR-AP 2023, 2023, : 176 - 184
  • [9] Combining Language Models with NLP and Interactive Query Expansion
    SanJuan, Eric
    Ibekwe-SanJuan, Fidelia
    FOCUSED RETRIEVAL AND EVALUATION, 2010, 6203 : 122 - +
  • [10] Interactive computer-aided diagnosis on medical image using large language models
    Sheng Wang
    Zihao Zhao
    Xi Ouyang
    Tianming Liu
    Qian Wang
    Dinggang Shen
    Communications Engineering, 3 (1):