Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning

被引:0
|
作者
Li, Wenyan [1 ]
Li, Jiaang [1 ]
Ramose, Rita [2 ]
Tang, Raphael [3 ]
Elliott, Desmond [1 ]
机构
[1] Univ Copenhagen, Dept Comp Sci, Copenhagen, Denmark
[2] Univ Lisbon, Inst Super Tecn, NESC ID, Lisbon, Portugal
[3] Comcast Appl AI, Philadelphia, PA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in retrieval-augmented models for image captioning highlight the benefit of retrieving related captions for efficient, lightweight models with strong domain-transfer capabilities. While these models demonstrate the success of retrieval augmentation, retrieval models are still far from perfect in practice: the retrieved information can sometimes mislead the model, resulting in incorrect generation and worse performance. In this paper, we analyze the robustness of a retrieval-augmented captioning model SMALLCAP. Our analysis shows that the model is sensitive to tokens that appear in the majority of the retrieved captions, and the input attribution shows that those tokens are likely copied into the generated output. Given these findings, we propose to train the model by sampling retrieved captions from more diverse sets. This decreases the chance that the model learns to copy majority tokens, and improves both in-domain and cross-domain performance.
引用
收藏
页码:9285 / 9299
页数:15
相关论文
共 50 条
  • [21] ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model
    Zhang, Mingyuan
    Guo, Xinying
    Pan, Liang
    Cai, Zhongang
    Hong, Fangzhou
    Li, Huirong
    Yang, Lei
    Liu, Ziwei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 364 - 373
  • [22] Retrieval-augmented Generation across Heterogeneous Knowledge
    Yu, Wenhao
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES: PROCEEDINGS OF THE STUDENT RESEARCH WORKSHOP, 2022, : 52 - 58
  • [23] Hierarchical Indexing for Retrieval-Augmented Opinion Summarization
    Hosking, Tom
    Tang, Hao
    Lapata, Mirella
    TRANSACTIONS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2024, 12 : 1533 - 1555
  • [24] Surface-Based Retrieval Reduces Perplexity of Retrieval-Augmented Language Models
    Doostmohammadi, Ehsan
    Norlund, Tobias
    Kuhlmann, Marco
    Johansson, Richard
    61ST CONFERENCE OF THE THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 2, 2023, : 521 - 529
  • [25] The Journey to A Knowledgeable Assistant with Retrieval-Augmented Generation (RAG)
    Dong, Xin Luna
    COMPANION OF THE 2024 INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, SIGMOD-COMPANION 2024, 2024, : 3 - 3
  • [26] Query Rewriting for Retrieval-Augmented Large Language Models
    Ma, Xinbei
    Gong, Yeyun
    He, Pengcheng
    Zhao, Hai
    Duan, Nan
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 5303 - 5315
  • [27] RIGHT: Retrieval-Augmented Generation for Mainstream Hashtag Recommendation
    Fan, Run-Ze
    Fan, Yixing
    Chen, Jiangui
    Guo, Jiafeng
    Zhang, Ruqing
    Cheng, Xueqi
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2024, PT I, 2024, 14608 : 39 - 55
  • [28] Retrieval-Augmented Code Generation for Universal Information Extraction
    Guo, Yucan
    Li, Zixuan
    Jin, Xiaolong
    Liu, Yantao
    Zeng, Yutao
    Liu, Wenxuan
    Li, Xiang
    Yang, Pan
    Bai, Long
    Guo, Jiafeng
    Chen, Xueqi
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT II, NLPCC 2024, 2025, 15360 : 30 - 42
  • [29] GOODTRIEVER: Adaptive Toxicity Mitigation with Retrieval-augmented Models
    Pozzobon, Luiza
    Ermis, Beyza
    Lewis, Patrick
    Hooker, Sara
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS - EMNLP 2023, 2023, : 5108 - 5125
  • [30] The Journey to A Knowledgeable Assistant with Retrieval-Augmented Generation (RAG)
    Dong, Xin Luna
    PROCEEDINGS OF THE 17TH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, WSDM 2024, 2024, : 4 - 4