Enhancing Interactive Image Retrieval With Query Rewriting Using Large Language Models and Vision Language Models

被引:2
|
作者
Zhu, Hongyi [1 ]
Huang, Jia-Hong [1 ]
Rudinac, Stevan [1 ]
Kanoulas, Evangelos [1 ]
机构
[1] Univ Amsterdam, Amsterdam, Netherlands
关键词
Interactive Image Retrieval; Query Rewriting; Vision Language Models; Large Language Models; INFORMATION;
D O I
10.1145/3652583.3658032
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image search stands as a pivotal task in multimedia and computer vision, finding applications across diverse domains, ranging from internet search to medical diagnostics. Conventional image search systems operate by accepting textual or visual queries, retrieving the top-relevant candidate results from the database. However, prevalent methods often rely on single-turn procedures, introducing potential inaccuracies and limited recall. These methods also face the challenges, such as vocabulary mismatch and the semantic gap, constraining their overall effectiveness. To address these issues, we propose an interactive image retrieval system capable of refining queries based on user relevance feedback in a multi-turn setting. This system incorporates a vision language model (VLM) based image captioner to enhance the quality of text-based queries, resulting in more informative queries with each iteration. Moreover, we introduce a large language model (LLM) based denoiser to refine text-based query expansions, mitigating inaccuracies in image descriptions generated by captioning models. To evaluate our system, we curate a new dataset by adapting the MSR-VTT video retrieval dataset to the image retrieval task, offering multiple relevant ground truth images for each query. Through comprehensive experiments, we validate the effectiveness of our proposed system against baseline methods, achieving state-of-the-art performance with a notable 10% improvement in terms of recall. Our contributions encompass the development of an innovative interactive image retrieval system, the integration of an LLM-based denoiser, the curation of a meticulously designed evaluation dataset, and thorough experimental validation.
引用
收藏
页码:978 / 987
页数:10
相关论文
共 50 条
  • [41] Leveraging Large Language Models for Sensor Data Retrieval
    Berenguer, Alberto
    Morejon, Adriana
    Tomas, David
    Mazon, Jose-Norberto
    APPLIED SCIENCES-BASEL, 2024, 14 (06):
  • [42] Enhancing the Readability of Preoperative Patient Instructions Using Large Language Models
    Hong, Hyo Jung
    Schmiesing, Clifford A.
    Goodell, Alex J.
    ANESTHESIOLOGY, 2024, 141 (03) : 608 - 610
  • [43] Enhancing Network Management Using Code Generated by Large Language Models
    Mani, Sathiya Kumaran
    Zhou, Yajie
    Hsieh, Kevin
    Segarra, Santiago
    Eberl, Trevor
    Azulai, Eliran
    Frizler, Ido
    Chandra, Ranveer
    Kandula, Srikanth
    PROCEEDINGS OF THE 22ND ACM WORKSHOP ON HOT TOPICS IN NETWORKS, HOTNETS 2023, 2023, : 196 - 204
  • [44] Corpus-Steered Query Expansion with Large Language Models
    Lei, Yibin
    Cao, Yu
    Zhou, Tianyi
    Shen, Tao
    Yates, Andrew
    PROCEEDINGS OF THE 18TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 2: SHORT PAPERS, 2024, : 393 - 401
  • [45] Semantic Mechanical Search with Large Vision and Language Models
    Sharma, Satvik
    Huang, Huang
    Shivakumar, Kaushik
    Chen, Lawrence Yunliang
    Hoque, Ryan
    Ichter, Brian
    Goldberg, Ken
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [46] Detecting and Preventing Hallucinations in Large Vision Language Models
    Gunjal, Anisha
    Yin, Jihan
    Bas, Erhan
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18135 - 18143
  • [47] Layered Query Retrieval: An Adaptive Framework for Retrieval-Augmented Generation in Complex Question Answering for Large Language Models
    Huang, Jie
    Wang, Mo
    Cui, Yunpeng
    Liu, Juan
    Chen, Li
    Wang, Ting
    Li, Huan
    Wu, Jinming
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [48] Using large language models in psychology
    Demszky, Dorottya
    Yang, Diyi
    Yeager, David
    Bryan, Christopher
    Clapper, Margarett
    Chandhok, Susannah
    Eichstaedt, Johannes
    Hecht, Cameron
    Jamieson, Jeremy
    Johnson, Meghann
    Jones, Michaela
    Krettek-Cobb, Danielle
    Lai, Leslie
    Jonesmitchell, Nirel
    Ong, Desmond
    Dweck, Carol
    Gross, James
    Pennebaker, James
    NATURE REVIEWS PSYCHOLOGY, 2023, 2 (11): : 688 - 701
  • [49] Using large language models in psychology
    Dorottya Demszky
    Diyi Yang
    David S. Yeager
    Christopher J. Bryan
    Margarett Clapper
    Susannah Chandhok
    Johannes C. Eichstaedt
    Cameron Hecht
    Jeremy Jamieson
    Meghann Johnson
    Michaela Jones
    Danielle Krettek-Cobb
    Leslie Lai
    Nirel JonesMitchell
    Desmond C. Ong
    Carol S. Dweck
    James J. Gross
    James W. Pennebaker
    Nature Reviews Psychology, 2023, 2 : 688 - 701
  • [50] IdealGPT: Iteratively Decomposing Vision and Language Reasoning via Large Language Models
    You, Haoxuan
    Sun, Rui
    Wang, Zhecan
    Chen, Long
    Wang, Gengyu
    Ayyubi, Hammad A.
    Chang, Kai-Wei
    Chang, Shih-Fu
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11289 - 11303