MITIGATING DATASET BIAS IN IMAGE CAPTIONING THROUGH CLIP CONFOUNDER-FREE CAPTIONING NETWORK

被引:2
|
作者
Kim, Yeonju [1 ]
Kim, Junho [1 ]
Lee, Byung-Kwan [1 ]
Shin, Sebin [1 ]
Ro, Yong Man [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Elect Engn, Image & Video Syst Lab, Daejeon, South Korea
关键词
Image captioning; Causal inference; Dataset bias; Global visual confounder; CLIP;
D O I
10.1109/ICIP49359.2023.10222502
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The dataset bias has been identified as a major challenge in image captioning. When the image captioning model predicts a word, it should consider the visual evidence associated with the word, but the model tends to use contextual evidence from the dataset bias and results in biased captions, especially when the dataset is biased toward some specific situations. To solve this problem, we approach from the causal inference perspective and design a causal graph. Based on the causal graph, we propose a novel method named C(2)Cap which is CLIP confounder-free captioning network. We use the global visual confounder to control the confounding factors in the image and train the model to produce debiased captions. We validate our proposed method on MSCOCO benchmark and demonstrate the effectiveness of our method. https://github.com/yeonju7kim/C2Cap
引用
收藏
页码:1720 / 1724
页数:5
相关论文
共 50 条
  • [21] Sieve: Multimodal Dataset Pruning Using Image Captioning Models
    Mahmouc, Anas
    Elhoushi, Mostafa
    Abbass, Amro
    Yang, Yu
    Ardalani, Newsha
    Leather, Hugh
    Morcos, Art S.
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 22423 - 22432
  • [22] RNIC-A retrospect network for image captioning
    Xiu-Long Yi
    Rong Hua
    You Fu
    Du-Lei Zheng
    Zhi-Yu Wang
    Soft Computing, 2022, 26 : 1501 - 1507
  • [23] Hierarchical Deep Neural Network for Image Captioning
    Su, Yuting
    Li, Yuqian
    Xu, Ning
    Liu, An-An
    NEURAL PROCESSING LETTERS, 2020, 52 (02) : 1057 - 1067
  • [24] Dense semantic embedding network for image captioning
    Xiao, Xinyu
    Wang, Lingfeng
    Ding, Kun
    Xiang, Shiming
    Pan, Chunhong
    PATTERN RECOGNITION, 2019, 90 : 285 - 296
  • [25] Relation Network and Causal Reasoning for Image Captioning
    Zhou, Dongming
    Yang, Jing
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 2718 - 2727
  • [26] Towards Image Captioning for the Portuguese Language: Evaluation on a Translated Dataset
    Gondim, Joao
    Claro, Daniela Barreiro
    Souza, Marlo
    ICEIS: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2022, : 384 - 393
  • [27] A SEQUENTIAL GUIDING NETWORK WITH ATTENTION FOR IMAGE CAPTIONING
    Sow, Daouda
    Qin, Zengchang
    Niasse, Mouhamed
    Wan, Tao
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3802 - 3806
  • [28] Bidirectional interactive alignment network for image captioning
    Cao, Xinrong
    Yan, Peixin
    Hu, Rong
    Li, Zuoyong
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [29] RNIC-A retrospect network for image captioning
    Yi, Xiu-Long
    Hua, Rong
    Fu, You
    Zheng, Du-Lei
    Wang, Zhi-Yu
    SOFT COMPUTING, 2022, 26 (04) : 1501 - 1507
  • [30] A Context Semantic Auxiliary Network for Image Captioning
    Li, Jianying
    Shao, Xiangjun
    INFORMATION, 2023, 14 (07)