Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning

被引：0

作者：

Li, Wenyan ^{[1
]}

Li, Jiaang ^{[1
]}

Ramose, Rita ^{[2
]}

Tang, Raphael ^{[3
]}

Elliott, Desmond ^{[1
]}

机构：

[1] Univ Copenhagen, Dept Comp Sci, Copenhagen, Denmark

[2] Univ Lisbon, Inst Super Tecn, NESC ID, Lisbon, Portugal

[3] Comcast Appl AI, Philadelphia, PA USA

来源：

PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS | 2024年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent advances in retrieval-augmented models for image captioning highlight the benefit of retrieving related captions for efficient, lightweight models with strong domain-transfer capabilities. While these models demonstrate the success of retrieval augmentation, retrieval models are still far from perfect in practice: the retrieved information can sometimes mislead the model, resulting in incorrect generation and worse performance. In this paper, we analyze the robustness of a retrieval-augmented captioning model SMALLCAP. Our analysis shows that the model is sensitive to tokens that appear in the majority of the retrieved captions, and the input attribution shows that those tokens are likely copied into the generated output. Given these findings, we propose to train the model by sampling retrieved captions from more diverse sets. This decreases the chance that the model learns to copy majority tokens, and improves both in-domain and cross-domain performance.

引用

页码：9285 / 9299

页数：15

共 50 条

[1] Retrieval-augmented Image Captioning
Ramos, Rita
Elliott, Desmond
Martins, Bruno
17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 3666 - 3681
[2] Retrieval-Augmented Transformer for Image Captioning
Sarto, Sara
Cornia, Marcella
Baraldi, Lorenzo
Cucchiara, Rita
19TH INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING, CBMI 2022, 2022, : 1 - 7
[3] Towards Retrieval-Augmented Architectures for Image Captioning
Sarto, Sara
Cornia, Marcella
Baraldi, Lorenzo
Nicolosi, Alessandro
Cucchiara, Rita
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (08)
[4] RECAP: RETRIEVAL-AUGMENTED AUDIO CAPTIONING
Ghosh, Sreyan
Kumar, Sonal
Evuru, Chandra Kiran Reddy
Duraiswami, Ramani
Manocha, Dinesh
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 1161 - 1165
[5] Retrieval-augmented Video Encoding for Instructional Captioning
Jung, Yeonjoon
Kim, Minsoo
Choi, Seungtaek
Seo, Minji
Hwang, Seung-won
Kim, Jihyuk
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 8554 - 8568
[6] MIRA-CAP: Memory-Integrated Retrieval-Augmented Captioning for State-of-the-Art Image and Video Captioning
Umirzakova, Sabina
Muksimova, Shakhnoza
Mardieva, Sevara
Baxtiyarovich, Murodjon Sultanov
Cho, Young-Im
SENSORS, 2024, 24 (24)
[7] Evaluating Retrieval Quality in Retrieval-Augmented Generation
Salemi, Alireza
Zamani, Hamed
PROCEEDINGS OF THE 47TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2024, 2024, : 2395 - 2400
[8] Neural Image Popularity Assessment with Retrieval-augmented Transformer
Ji, Liya
Park, Chan Ho
Rao, Zhefan
Chen, Qifeng
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2427 - 2436
[9] Retrieval-Augmented Diffusion Models
Blattmann, Andreas
Rombach, Robin
Oktay, Kaan
Mueller, Jonas
Ommer, Bjoern
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[10] Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Yang, Zhuolin
Ping, Wei
Liu, Zihan
Korthikanti, Vijay
Nie, Weili
Huang, De-An
Fang, Linxi
Yu, Zhiding
Lan, Shiyi
Li, Bo
Shoeybi, Mohammad
Liu, Ming-Yu
Zhu, Yuke
Catanzaro, Bryan
Xiao, Chaowei
Anandkumar, Anima
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 11844 - 11857

← 1 2 3 4 5 →