Switching to Discriminative Image Captioning by Relieving a Bottleneck of Reinforcement Learning

被引:5
|
作者
Honda, Ukyo [1 ,2 ]
Watanabe, Taro [3 ]
Matsumoto, Yuji [2 ]
机构
[1] CyberAgent Inc, Tokyo, Japan
[2] RIKEN, Tokyo, Japan
[3] Nara Inst Sci & Technol, Ikoma, Nara, Japan
关键词
D O I
10.1109/WACV56688.2023.00118
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Discriminativeness is a desirable feature of image captions: captions should describe the characteristic details of input images. However, recent high-performing captioning models, which are trained with reinforcement learning (RL), tend to generate overly generic captions despite their high performance in various other criteria. First, we investigate the cause of the unexpectedly low discriminativeness and show that RL has a deeply rooted side effect of limiting the output words to high-frequency words. The limited vocabulary is a severe bottleneck for discriminativeness as it is difficult for a model to describe the details beyond its vocabulary. Then, based on this identification of the bottleneck, we drastically recast discriminative image captioning as a much simpler task of encouraging low-frequency word generation. Hinted by long-tail classification and debiasing methods, we propose methods that easily switch off-the-shelf RL models to discriminativeness-aware models with only a single-epoch fine-tuning on the part of the parameters. Extensive experiments demonstrate that our methods significantly enhance the discriminativeness of off-the-shelf RL models and even outperform previous discriminativeness-aware methods with much smaller computational costs. Detailed analysis and human evaluation also verify that our methods boost the discriminativeness without sacrificing the overall quality of captions.(1)
引用
收藏
页码:1124 / 1134
页数:11
相关论文
共 50 条
  • [31] Deep Learning Approaches on Image Captioning: A Review
    Ghandi, Taraneh
    Pourreza, Hamidreza
    Mahyar, Hamidreza
    ACM COMPUTING SURVEYS, 2024, 56 (03)
  • [32] Learning to Collocate Neural Modules for Image Captioning
    Yang, Xu
    Zhang, Hanwang
    Cai, Jianfei
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4249 - 4259
  • [33] A Comprehensive Survey of Deep Learning for Image Captioning
    Hossain, Md Zakir
    Sohel, Ferdous
    Shiratuddin, Mohd Fairuz
    Laga, Hamid
    ACM COMPUTING SURVEYS, 2019, 51 (06)
  • [34] Meta captioning: A meta learning based remote sensing image captioning framework
    Yang, Qiaoqiao
    Ni, Zihao
    Ren, Peng
    ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING, 2022, 186 : 190 - 200
  • [35] Learning Distinct and Representative Modes for Image Captioning
    Chen, Qi
    Deng, Chaorui
    Wu, Qi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [36] Image Captioning with Partially Rewarded Imitation Learning
    Yu, Xintong
    Guo, Tszhang
    Fu, Kun
    Li, Lei
    Zhang, Changshui
    Zhang, Jianwei
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [37] Facilitated Deep Learning Models for Image Captioning
    Azhar, Imtinan
    Afyouni, Imad
    Elnagar, Ashraf
    2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [38] CaMEL: Mean Teacher Learning for Image Captioning
    Barraco, Manuele
    Stefanini, Matteo
    Cornia, Marcella
    Cascianelli, Silvia
    Baraldi, Lorenzo
    Cucchiara, Rita
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 4087 - 4094
  • [39] Neural Symbolic Representation Learning for Image Captioning
    Wang, Xiaomei
    Ma, Lin
    Fu, Yanwei
    Xue, Xiangyang
    PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL (ICMR '21), 2021, : 312 - 321
  • [40] Collaborative Learning Method for Natural Image Captioning
    Wang, Rongzhao
    Liu, Libo
    DATA SCIENCE (ICPCSEE 2022), PT I, 2022, 1628 : 249 - 261