Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models

被引:2
|
作者
Zhu, Hongyi [1 ]
Huang, Jia-Hong [1 ]
Rudinac, Stevan [1 ]
Kanoulas, Evangelos [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
关键词
Interactive Image Retrieval; Query Rewriting; Vision Language Models; Large Language Models; INFORMATION;
D O I
10.1145/3652583.3658032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image search stands as a pivotal task in multimedia and computer vision, finding applications across diverse domains, ranging from internet search to medical diagnostics. Conventional image search systems operate by accepting textual or visual queries, retrieving the top-relevant candidate results from the database. However, prevalent methods often rely on single-turn procedures, introducing potential inaccuracies and limited recall. These methods also face the challenges, such as vocabulary mismatch and the semantic gap, constraining their overall effectiveness. To address these issues, we propose an interactive image retrieval system capable of refining queries based on user relevance feedback in a multi-turn setting. This system incorporates a vision language model (VLM) based image captioner to enhance the quality of text-based queries, resulting in more informative queries with each iteration. Moreover, we introduce a large language model (LLM) based denoiser to refine text-based query expansions, mitigating inaccuracies in image descriptions generated by captioning models. To evaluate our system, we curate a new dataset by adapting the MSR-VTT video retrieval dataset to the image retrieval task, offering multiple relevant ground truth images for each query. Through comprehensive experiments, we validate the effectiveness of our proposed system against baseline methods, achieving state-of-the-art performance with a notable 10% improvement in terms of recall. Our contributions encompass the development of an innovative interactive image retrieval system, the integration of an LLM-based denoiser, the curation of a meticulously designed evaluation dataset, and thorough experimental validation.
引用
收藏
页码:978 / 987
页数:10
相关论文
共 50 条
  • [11] Rewriting Conversational Utterances with Instructed Large Language Models
    Galimzhanova, Elnara
    Muntean, Cristina Ioana
    Nardini, Franco Maria
    Perego, Raffaele
    Rocchietti, Guido
    2023 IEEE INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY, WI-IAT, 2023, : 56 - 63
  • [12] Retrieval augmentation of large language models for lay language generation
    Guo, Yue
    Qiu, Wei
    Leroy, Gondy
    Wang, Sheng
    Cohen, Trevor
    JOURNAL OF BIOMEDICAL INFORMATICS, 2024, 149
  • [13] Retrieval augmentation of large language models for lay language generation
    Guo, Yue
    Qiu, Wei
    Leroy, Gondy
    Wang, Sheng
    Cohen, Trevor
    Journal of Biomedical Informatics, 2024, 149
  • [14] InteraRec: Interactive Recommendations Using Multimodal Large Language Models
    Karra, Saketh Reddy
    Tulabandhula, Theja
    TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2024 WORKSHOPS, RAFDA AND IWTA, 2024, 14658 : 32 - 43
  • [15] Context-Driven Interactive Query Simulations Based on Generative Large Language Models
    Engelmann, Bjoern
    Breuer, Timo
    Friese, Jana Isabelle
    Schaer, Philipp
    Fuhr, Norbert
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT II, 2024, 14609 : 173 - 188
  • [16] Enhancing Financial Sentiment Analysis via Retrieval Augmented Large Language Models
    Zhang, Boyu
    Yang, Hongyang
    Zhou, Tianyu
    Babar, Ali
    Liu, Xiao-Yang
    PROCEEDINGS OF THE 4TH ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2023, 2023, : 349 - 356
  • [17] Synthetic Query Generation using Large Language Models for Virtual Assistants
    Sannigrahi, Sonal
    Fraga-Silva, Thiago
    Oualil, Youssef
    Van Gysel, Christophe
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2837 - 2841
  • [18] Enhancing Genetic Improvement Mutations Using Large Language Models
    Brownlee, Alexander E.I.
    Callan, James
    Even-Mendoza, Karine
    Geiger, Alina
    Hanna, Carol
    Petke, Justyna
    Sarro, Federica
    Sobania, Dominik
    arXiv, 2023,
  • [19] Using Language Models and Topic Models for XML Retrieval
    Huang, Fang
    FOCUSED ACCESS TO XML DOCUMENTS, 2008, 4862 : 94 - 102
  • [20] Enhancing Genetic Improvement Mutations Using Large Language Models
    Brownlee, Alexander E. I.
    Callan, James
    Even-Mendoza, Karine
    Geiger, Alina
    Hanna, Carol
    Petke, Justyna
    Sarro, Federica
    Sobania, Dominik
    SEARCH-BASED SOFTWARE ENGINEERING, SSBSE 2023, 2024, 14415 : 153 - 159