Cross-Domain Image Captioning with Discriminative Finetuning

被引:6
|
作者
Dessi, Roberto [1 ]
Bevilacqua, Michele [2 ]
Gualdoni, Eleonora [3 ]
Carraz Rakotonirina, Nathanael [3 ]
Franzon, Francesca [3 ]
Baroni, Marco [4 ]
机构
[1] UPF, Meta AI, Barcelona, Spain
[2] Samaya AI, Mountain View, CA USA
[3] UPF, Barcelona, Spain
[4] UPF, ICREA, Barcelona, Spain
基金
欧洲研究理事会;
关键词
D O I
10.1109/CVPR52729.2023.00670
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural captioners are typically trained to mimic human-generated references without optimizing for any specific communication goal, leading to problems such as the generation of vague captions. In this paper, we show that fine-tuning an out-of-the-box neural captioner with a self-supervised discriminative communication objective helps to recover a plain, visually descriptive language that is more informative about image contents. Given a target image, the system must learn to produce a description that enables an out-of-the-box text-conditioned image retriever to identify such image among a set of candidates. We experiment with the popular ClipCap captioner, also replicating the main results with BLIP. In terms of similarity to ground-truth human descriptions, the captions emerging from discriminative finetuning lag slightly behind those generated by the non-finetuned model, when the latter is trained and tested on the same caption dataset. However, when the model is used without further tuning to generate captions for out-of-domain datasets, our discriminatively-finetuned captioner generates descriptions that resemble human references more than those produced by the same captioner wihtout finetuning. We further show that, on the Conceptual Captions dataset, discriminatively finetuned captions are more helpful than either vanilla ClipCap captions or ground-truth captions for human annotators tasked with an image discrimination task.(1)
引用
收藏
页码:6935 / 6944
页数:10
相关论文
共 50 条
  • [21] MARKOVIAN DISCRIMINATIVE MODELING FOR CROSS-DOMAIN DIALOG STATE TRACKING
    Ren, Hang
    Xu, Weiqun
    Yan, Yonghong
    2014 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY SLT 2014, 2014, : 342 - 347
  • [22] Discriminative manifold domain adaptation for cross-domain fault diagnosis of rotating machineries
    Qin, Yi
    Wang, Zhengyi
    Qian, Quan
    Wang, Yi
    Luo, Jun
    KNOWLEDGE-BASED SYSTEMS, 2024, 285
  • [23] Image-to-image translation for cross-domain disentanglement
    Gonzalez-Garcia, Abel
    van de Weijer, Joost
    Bengio, Yoshua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [24] Cross-domain learning for underwater image enhancement
    Li, Fei
    Zheng, Jiangbin
    Zhang, Yuan-fang
    Jia, Wenjing
    Wei, Qianru
    He, Xiangjian
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2023, 110
  • [25] Cross-domain image retrieval: methods and applications
    Xiaoping Zhou
    Xiangyu Han
    Haoran Li
    Jia Wang
    Xun Liang
    International Journal of Multimedia Information Retrieval, 2022, 11 : 199 - 218
  • [26] CROSS-DOMAIN CNN FOR HYPERSPECTRAL IMAGE CLASSIFICATION
    Lee, Hyungtae
    Eum, Sungmin
    Kwon, Heesung
    IGARSS 2018 - 2018 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, 2018, : 3627 - 3630
  • [27] Survey on clothing image retrieval with cross-domain
    Chen Ning
    Yang Di
    Li Menglu
    COMPLEX & INTELLIGENT SYSTEMS, 2022, 8 (06) : 5531 - 5544
  • [28] Cross-domain image retrieval: methods and applications
    Zhou, Xiaoping
    Han, Xiangyu
    Li, Haoran
    Wang, Jia
    Liang, Xun
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2022, 11 (03) : 199 - 218
  • [29] Survey on clothing image retrieval with cross-domain
    Chen Ning
    Yang Di
    Li Menglu
    Complex & Intelligent Systems, 2022, 8 : 5531 - 5544
  • [30] PuppetGAN: Cross-Domain Image Manipulation by Demonstration
    Usman, Ben
    Dufour, Nick
    Saenko, Kate
    Bregler, Chris
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 9449 - 9457