MITIGATING DATASET BIAS IN IMAGE CAPTIONING THROUGH CLIP CONFOUNDER-FREE CAPTIONING NETWORK

被引:2
|
作者
Kim, Yeonju [1 ]
Kim, Junho [1 ]
Lee, Byung-Kwan [1 ]
Shin, Sebin [1 ]
Ro, Yong Man [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Elect Engn, Image & Video Syst Lab, Daejeon, South Korea
关键词
Image captioning; Causal inference; Dataset bias; Global visual confounder; CLIP;
D O I
10.1109/ICIP49359.2023.10222502
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The dataset bias has been identified as a major challenge in image captioning. When the image captioning model predicts a word, it should consider the visual evidence associated with the word, but the model tends to use contextual evidence from the dataset bias and results in biased captions, especially when the dataset is biased toward some specific situations. To solve this problem, we approach from the causal inference perspective and design a causal graph. Based on the causal graph, we propose a novel method named C(2)Cap which is CLIP confounder-free captioning network. We use the global visual confounder to control the confounding factors in the image and train the model to produce debiased captions. We validate our proposed method on MSCOCO benchmark and demonstrate the effectiveness of our method. https://github.com/yeonju7kim/C2Cap
引用
收藏
页码:1720 / 1724
页数:5
相关论文
共 50 条
  • [1] Mitigating Gender Bias in Captioning Systems
    Tang, Ruixiang
    Du, Mengnan
    Li, Yuening
    Liu, Zirui
    Zou, Na
    Hu, Xia
    PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, : 633 - 645
  • [2] A Confounder-Free Fusion Network for Aerial Image Scene Feature Representation
    Xiong, Wei
    Xiong, Zhenyu
    Cui, Yaqi
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 5440 - 5454
  • [3] #PraCegoVer: A Large Dataset for Image Captioning in Portuguese
    dos Santos, Gabriel Oliveira
    Colombini, Esther Luna
    Avila, Sandra
    DATA, 2022, 7 (02)
  • [4] A dental intraoral image dataset of gingivitis for image captioning
    Duy, Hoang Bao
    Hue, Tran Thi
    Son, Tong Minh
    Nghia, Le Long
    Lan, Luong Thi Hong
    Duc, Nguyen Minh
    Son, Le Hoang
    DATA IN BRIEF, 2024, 57
  • [5] Human Attention in Image Captioning: Dataset and Analysis
    He, Sen
    Tavakoli, Hamed R.
    Borji, Ali
    Pugeault, Nicolas
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8528 - 8537
  • [6] Quantifying Societal Bias Amplification in Image Captioning
    Hirota, Yusuke
    Nakashima, Yuta
    Garcia, Noa
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13440 - 13449
  • [7] TSIC-CLIP: Traffic Scene Image Captioning Model Based on Clip
    Zhang, Hao
    Xu, Cheng
    Xu, Bingxin
    Jiane, Muwei
    Liu, Hongzhe
    Li, Xuewei
    INFORMATION TECHNOLOGY AND CONTROL, 2024, 53 (01): : 98 - 114
  • [8] Image Captioning with Generative Adversarial Network
    Amirian, Soheyla
    Rasheed, Khaled
    Taha, Thiab R.
    Arabnia, Hamid R.
    2019 6TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI 2019), 2019, : 272 - 275
  • [9] Attentive Contextual Network for Image Captioning
    Prudviraj, Jeripothula
    Vishnu, Chalavadi
    Mohan, C. Krishna
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [10] Aesthetic image captioning on the FAE-Captions dataset
    Jin, Xin
    Lv, Jianwen
    Zhou, Xinghui
    Xiao, Chaoen
    Li, Xiaodong
    Zhao, Shu
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101