MITIGATING DATASET BIAS IN IMAGE CAPTIONING THROUGH CLIP CONFOUNDER-FREE CAPTIONING NETWORK

被引:2
|
作者
Kim, Yeonju [1 ]
Kim, Junho [1 ]
Lee, Byung-Kwan [1 ]
Shin, Sebin [1 ]
Ro, Yong Man [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Elect Engn, Image & Video Syst Lab, Daejeon, South Korea
关键词
Image captioning; Causal inference; Dataset bias; Global visual confounder; CLIP;
D O I
10.1109/ICIP49359.2023.10222502
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The dataset bias has been identified as a major challenge in image captioning. When the image captioning model predicts a word, it should consider the visual evidence associated with the word, but the model tends to use contextual evidence from the dataset bias and results in biased captions, especially when the dataset is biased toward some specific situations. To solve this problem, we approach from the causal inference perspective and design a causal graph. Based on the causal graph, we propose a novel method named C(2)Cap which is CLIP confounder-free captioning network. We use the global visual confounder to control the confounding factors in the image and train the model to produce debiased captions. We validate our proposed method on MSCOCO benchmark and demonstrate the effectiveness of our method. https://github.com/yeonju7kim/C2Cap
引用
收藏
页码:1720 / 1724
页数:5
相关论文
共 50 条
  • [41] CgT-GAN: CLIP-guided Text GAN for Image Captioning
    Yu, Jiarui
    Li, Haoran
    Hao, Yanbin
    Zhu, Bin
    Xu, Tong
    He, Xiangnan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 2252 - 2263
  • [42] Aesthetic multi-attributes network for image captioning
    Yang, Hongtao
    Li, Yuchen
    Jin, Xin
    Zhou, Xinghui
    Shi, Ping
    Liu, Yehui
    COMPUTERS & ELECTRICAL ENGINEERING, 2025, 123
  • [43] Multi-Gate Attention Network for Image Captioning
    Jiang, Weitao
    Li, Xiying
    Hu, Haifeng
    Lu, Qiang
    Liu, Bohong
    IEEE ACCESS, 2021, 9 : 69700 - 69709
  • [44] Collaborative strategy network for spatial attention image captioning
    Dongming Zhou
    Jing Yang
    Riqiang Bao
    Applied Intelligence, 2022, 52 : 9017 - 9032
  • [45] Multi-Keys Attention Network for Image Captioning
    Yang, Ziqian
    Li, Hui
    Ouyang, Renrong
    Zhang, Quan
    Xiao, Jimin
    COGNITIVE COMPUTATION, 2024, 16 (03) : 1061 - 1072
  • [46] Structural Representative Network for Remote Sensing Image Captioning
    Sharma, Jaya
    Divya, Peketi
    Sravani, Yenduri
    Shekar, B. H.
    Mohan, Krishna C.
    FIFTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION, ICMV 2022, 2023, 12701
  • [47] Learning joint relationship attention network for image captioning
    Wang, Changzhi
    Gu, Xiaodong
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 211
  • [48] Image captioning using DenseNet network and adaptive attention
    Deng, Zhenrong
    Jiang, Zhouqin
    Lan, Rushi
    Huang, Wenming
    Luo, Xiaonan
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 85
  • [49] Collaborative strategy network for spatial attention image captioning
    Zhou, Dongming
    Yang, Jing
    Bao, Riqiang
    APPLIED INTELLIGENCE, 2022, 52 (08) : 9017 - 9032
  • [50] Multiscale Multiinteraction Network for Remote Sensing Image Captioning
    Wang, Yong
    Zhang, Wenkai
    Zhang, Zhengyuan
    Gao, Xin
    Sun, Xian
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 2154 - 2165