Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning

被引:0
|
作者
Li, Wenyan [1 ]
Li, Jiaang [1 ]
Ramose, Rita [2 ]
Tang, Raphael [3 ]
Elliott, Desmond [1 ]
机构
[1] Univ Copenhagen, Dept Comp Sci, Copenhagen, Denmark
[2] Univ Lisbon, Inst Super Tecn, NESC ID, Lisbon, Portugal
[3] Comcast Appl AI, Philadelphia, PA USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in retrieval-augmented models for image captioning highlight the benefit of retrieving related captions for efficient, lightweight models with strong domain-transfer capabilities. While these models demonstrate the success of retrieval augmentation, retrieval models are still far from perfect in practice: the retrieved information can sometimes mislead the model, resulting in incorrect generation and worse performance. In this paper, we analyze the robustness of a retrieval-augmented captioning model SMALLCAP. Our analysis shows that the model is sensitive to tokens that appear in the majority of the retrieved captions, and the input attribution shows that those tokens are likely copied into the generated output. Given these findings, we propose to train the model by sampling retrieved captions from more diverse sets. This decreases the chance that the model learns to copy majority tokens, and improves both in-domain and cross-domain performance.
引用
收藏
页码:9285 / 9299
页数:15
相关论文
共 50 条
  • [1] Retrieval-augmented Image Captioning
    Ramos, Rita
    Elliott, Desmond
    Martins, Bruno
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3666 - 3681
  • [2] Retrieval-Augmented Transformer for Image Captioning
    Sarto, Sara
    Cornia, Marcella
    Baraldi, Lorenzo
    Cucchiara, Rita
    19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 1 - 7
  • [3] Towards Retrieval-Augmented Architectures for Image Captioning
    Sarto, Sara
    Cornia, Marcella
    Baraldi, Lorenzo
    Nicolosi, Alessandro
    Cucchiara, Rita
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (08)
  • [4] RECAP: RETRIEVAL-AUGMENTED AUDIO CAPTIONING
    Ghosh, Sreyan
    Kumar, Sonal
    Evuru, Chandra Kiran Reddy
    Duraiswami, Ramani
    Manocha, Dinesh
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1161 - 1165
  • [5] Retrieval-augmented Video Encoding for Instructional Captioning
    Jung, Yeonjoon
    Kim, Minsoo
    Choi, Seungtaek
    Seo, Minji
    Hwang, Seung-won
    Kim, Jihyuk
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8554 - 8568
  • [6] MIRA-CAP: Memory-Integrated Retrieval-Augmented Captioning for State-of-the-Art Image and Video Captioning
    Umirzakova, Sabina
    Muksimova, Shakhnoza
    Mardieva, Sevara
    Baxtiyarovich, Murodjon Sultanov
    Cho, Young-Im
    SENSORS, 2024, 24 (24)
  • [7] Evaluating Retrieval Quality in Retrieval-Augmented Generation
    Salemi, Alireza
    Zamani, Hamed
    PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2395 - 2400
  • [8] Neural Image Popularity Assessment with Retrieval-augmented Transformer
    Ji, Liya
    Park, Chan Ho
    Rao, Zhefan
    Chen, Qifeng
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2427 - 2436
  • [9] Retrieval-Augmented Diffusion Models
    Blattmann, Andreas
    Rombach, Robin
    Oktay, Kaan
    Mueller, Jonas
    Ommer, Bjoern
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [10] Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
    Yang, Zhuolin
    Ping, Wei
    Liu, Zihan
    Korthikanti, Vijay
    Nie, Weili
    Huang, De-An
    Fang, Linxi
    Yu, Zhiding
    Lan, Shiyi
    Li, Bo
    Shoeybi, Mohammad
    Liu, Ming-Yu
    Zhu, Yuke
    Catanzaro, Bryan
    Xiao, Chaowei
    Anandkumar, Anima
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11844 - 11857