Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models

被引:2
|
作者
Zhu, Hongyi [1 ]
Huang, Jia-Hong [1 ]
Rudinac, Stevan [1 ]
Kanoulas, Evangelos [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
关键词
Interactive Image Retrieval; Query Rewriting; Vision Language Models; Large Language Models; INFORMATION;
D O I
10.1145/3652583.3658032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image search stands as a pivotal task in multimedia and computer vision, finding applications across diverse domains, ranging from internet search to medical diagnostics. Conventional image search systems operate by accepting textual or visual queries, retrieving the top-relevant candidate results from the database. However, prevalent methods often rely on single-turn procedures, introducing potential inaccuracies and limited recall. These methods also face the challenges, such as vocabulary mismatch and the semantic gap, constraining their overall effectiveness. To address these issues, we propose an interactive image retrieval system capable of refining queries based on user relevance feedback in a multi-turn setting. This system incorporates a vision language model (VLM) based image captioner to enhance the quality of text-based queries, resulting in more informative queries with each iteration. Moreover, we introduce a large language model (LLM) based denoiser to refine text-based query expansions, mitigating inaccuracies in image descriptions generated by captioning models. To evaluate our system, we curate a new dataset by adapting the MSR-VTT video retrieval dataset to the image retrieval task, offering multiple relevant ground truth images for each query. Through comprehensive experiments, we validate the effectiveness of our proposed system against baseline methods, achieving state-of-the-art performance with a notable 10% improvement in terms of recall. Our contributions encompass the development of an innovative interactive image retrieval system, the integration of an LLM-based denoiser, the curation of a meticulously designed evaluation dataset, and thorough experimental validation.
引用
收藏
页码:978 / 987
页数:10
相关论文
共 50 条
  • [21] Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy
    Shaol, Zhihong
    Gong, Yeyun
    Shen, Yelong
    Huang, Minlie
    Duane, Nan
    Chen, Weizhu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9248 - 9274
  • [22] Multimodal Large Language Models in Vision and Ophthalmology
    Lu, Zhiyong
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2024, 65 (07)
  • [23] Vision of the future: large language models in ophthalmology
    Tailor, Prashant D.
    D'Souza, Haley S.
    Li, Hanzhou
    Starr, Matthew R.
    CURRENT OPINION IN OPHTHALMOLOGY, 2024, 35 (05) : 391 - 402
  • [24] Enhancing Persona Consistency with Large Language Models
    Shi, Haozhe
    Niu, Kun
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKS AND INTERNET OF THINGS, CNIOT 2024, 2024, : 210 - 215
  • [25] Enhancing Conversational Search with Large Language Models
    Rocchietti, Guido
    Muntean, Cristina Ioana
    Nardini, Franco Maria
    ERCIM NEWS, 2024, (136): : 33 - 34
  • [26] Query2doc: Query Expansion with Large Language Models
    Wang, Liang
    Yang, Nan
    Wei, Furu
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2023), 2023, : 9414 - 9423
  • [27] Interactive clustering and high-recall information retrieval using language models
    Rezaeipourfarsangi, Sima
    Pei, Ningyuan
    Sherkat, Ehsan
    Milios, Evangelos
    PROCEEDINGS OF THE WORKING CONFERENCE ON ADVANCED VISUAL INTERFACES AVI 2022, 2022,
  • [28] Enhancing Conversational Search: Large Language Model-Aided Informative Query Rewriting
    Ye, Fanghua
    Fang, Meng
    Li, Shenghui
    Yilmaz, Emine
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5985 - 6006
  • [29] Chat with the Environment: Interactive Multimodal Perception Using Large Language Models
    Zhao, Xufeng
    Li, Mengdi
    Weber, Cornelius
    Hafez, Muhammad Burhan
    Wermter, Stefan
    2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 3590 - 3596
  • [30] Enhancing textual textbook question answering with large language models and retrieval augmented generation
    Alawwad, Hessa A.
    Alhothali, Areej
    Naseem, Usman
    Alkhathlan, Ali
    Jamal, Amani
    PATTERN RECOGNITION, 2025, 162