Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation

被引:0
|
作者
Yan, Yichen [1 ,2 ]
He, Xingjian [1 ]
Chen, Sihan [2 ]
Liu, Jing [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
referring image segmentation; iterative calibration; language reconstruction;
D O I
10.1145/3652583.3658095
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation aims to segment an object referred to by natural language expression from an image. The primary challenge lies in the efficient propagation of fine-grained semantic information from textual features to visual features. Many recent works utilize a Transformer to address this challenge. However, conventional transformer decoders can distort linguistic information with deeper layers, leading to suboptimal results. In this paper, we introduce CRFormer, a model that iteratively calibrates multi-modal features in the transformer decoder. We start by generating language queries using vision features, emphasizing different aspects of the input language. Then, we propose a novel Calibration Decoder (CDec) wherein the multi-modal features can iteratively calibrated by the input language features. In the Calibration Decoder, we use the output of each decoder layer and the original language features to generate new queries for continuous calibration, which gradually updates the language features. Based on CDec, we introduce a Language Reconstruction Module and a reconstruction loss. This module leverages queries from the final layer of the decoder to reconstruct the input language and compute the reconstruction loss. This can further prevent the language information from being lost or distorted. Our experiments consistently show the superior performance of our approach across RefCOCO, RefCOCO+, and G-Ref datasets compared to state-of-the-art methods.
引用
收藏
页码:451 / 459
页数:9
相关论文
共 50 条
  • [31] REFERRING IMAGE SEGMENTATION FOR REMOTE SENSING DATA
    Yuan, Zhenghang
    Mou, Lichao
    Hua, Yuansheng
    Zhu, Xiao Xiang
    IGARSS 2024-2024 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM, IGARSS 2024, 2024, : 946 - 949
  • [32] SLViT: Scale-Wise Language-Guided Vision Transformer for Referring Image Segmentation
    Ouyang, Shuyi
    Wang, Hongyi
    Xie, Shiao
    Niu, Ziwei
    Tong, Ruofeng
    Chen, Yen-Wei
    Lin, Lanfen
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 1294 - 1302
  • [33] Confidence Calibration and Predictive Uncertainty Estimation for Deep Medical Image Segmentation
    Mehrtash, Alireza
    Wells, William M., III
    Tempany, Clare M.
    Abolmaesumi, Purang
    Kapur, Tina
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (12) : 3868 - 3878
  • [34] POST TRAINING UNCERTAINTY CALIBRATION OF DEEP NETWORKS FOR MEDICAL IMAGE SEGMENTATION
    Rousseau, Axel-Jan
    Becker, Thijs
    Bertels, Jeroen
    Blaschko, Matthew B.
    Valkenborg, Dirk
    2021 IEEE 18TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2021, : 1052 - 1056
  • [35] PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
    Liu, Jiang
    Ding, Hui
    Cai, Zhaowei
    Zhang, Yuting
    Satzoda, Ravi Kumar
    Mahadevan, Vijay
    Manmatha, R.
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 18653 - 18663
  • [36] CRIS: CLIP-Driven Referring Image Segmentation
    Wang, Zhaoqing
    Lu, Yu
    Li, Qiang
    Tao, Xunqiang
    Guo, Yandong
    Gong, Mingming
    Liu, Tongliang
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11676 - 11685
  • [37] Attentive Excitation and Aggregation for Bilingual Referring Image Segmentation
    Zhou, Qianli
    Hui, Tianrui
    Wang, Rong
    Hu, Haimiao
    Liu, Si
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (02)
  • [38] Structured Multimodal Fusion Network for Referring Image Segmentation
    Xue, Mingcheng
    Liu, Yu
    Xu, Kaiping
    Zhang, Haiyang
    Yu, Chengyang
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 36 - 47
  • [39] Dual Convolutional LSTM Network for Referring Image Segmentation
    Ye, Linwei
    Liu, Zhi
    Wang, Yang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3224 - 3235
  • [40] A survey of methods for addressing the challenges of referring image segmentation
    Ji, Lixia
    Du, Yunlong
    Dang, Yiping
    Gao, Wenzhao
    Zhang, Han
    NEUROCOMPUTING, 2024, 583