Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning

被引：5

作者：

Honda, Ukyo ^{[1
,2
]}

Watanabe, Taro ^{[3
]}

Matsumoto, Yuji ^{[2
]}

机构：

[1] CyberAgent Inc, Tokyo, Japan

[2] RIKEN, Tokyo, Japan

[3] Nara Inst Sci & Technol, Ikoma, Nara, Japan

来源：

2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV) | 2023年

关键词：

D O I：

10.1109/WACV56688.2023.00118

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Discriminativeness is a desirable feature of image captions: captions should describe the characteristic details of input images. However, recent high-performing captioning models, which are trained with reinforcement learning (RL), tend to generate overly generic captions despite their high performance in various other criteria. First, we investigate the cause of the unexpectedly low discriminativeness and show that RL has a deeply rooted side effect of limiting the output words to high-frequency words. The limited vocabulary is a severe bottleneck for discriminativeness as it is difficult for a model to describe the details beyond its vocabulary. Then, based on this identification of the bottleneck, we drastically recast discriminative image captioning as a much simpler task of encouraging low-frequency word generation. Hinted by long-tail classification and debiasing methods, we propose methods that easily switch off-the-shelf RL models to discriminativeness-aware models with only a single-epoch fine-tuning on the part of the parameters. Extensive experiments demonstrate that our methods significantly enhance the discriminativeness of off-the-shelf RL models and even outperform previous discriminativeness-aware methods with much smaller computational costs. Detailed analysis and human evaluation also verify that our methods boost the discriminativeness without sacrificing the overall quality of captions.(1)

引用

页码：1124 / 1134

页数：11

共 50 条

[31] Deep Learning Approaches on Image Captioning: A Review
Ghandi, Taraneh
Pourreza, Hamidreza
Mahyar, Hamidreza
ACM COMPUTING SURVEYS, 2024, 56 (03)
[32] Learning to Collocate Neural Modules for Image Captioning
Yang, Xu
Zhang, Hanwang
Cai, Jianfei
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4249 - 4259
[33] A Comprehensive Survey of Deep Learning for Image Captioning
Hossain, Md Zakir
Sohel, Ferdous
Shiratuddin, Mohd Fairuz
Laga, Hamid
ACM COMPUTING SURVEYS, 2019, 51 (06)
[34] Meta captioning: A meta learning based remote sensing image captioning framework
Yang, Qiaoqiao
Ni, Zihao
Ren, Peng
ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 186 : 190 - 200
[35] Learning Distinct and Representative Modes for Image Captioning
Chen, Qi
Deng, Chaorui
Wu, Qi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[36] Image Captioning with Partially Rewarded Imitation Learning
Yu, Xintong
Guo, Tszhang
Fu, Kun
Li, Lei
Zhang, Changshui
Zhang, Jianwei
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[37] Facilitated Deep Learning Models for Image Captioning
Azhar, Imtinan
Afyouni, Imad
Elnagar, Ashraf
2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
[38] CaMEL: Mean Teacher Learning for Image Captioning
Barraco, Manuele
Stefanini, Matteo
Cornia, Marcella
Cascianelli, Silvia
Baraldi, Lorenzo
Cucchiara, Rita
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4087 - 4094
[39] Neural Symbolic Representation Learning for Image Captioning
Wang, Xiaomei
Ma, Lin
Fu, Yanwei
Xue, Xiangyang
PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 312 - 321
[40] Collaborative Learning Method for Natural Image Captioning
Wang, Rongzhao
Liu, Libo
DATA SCIENCE (ICPCSEE 2022), PT I, 2022, 1628 : 249 - 261

← 1 2 3 4 5 →