Dual Context Perception Transformer for Referring Image Segmentation

被引:0
|
作者
Kong, Yuqiu [1 ]
Liu, Junhua [1 ]
Yao, Cuili [1 ]
机构
[1] Dalian Univ Technol, Dalian 116024, Peoples R China
基金
中国国家自然科学基金;
关键词
Referring image segmentation; Vision-linguistic alignment; Multi-modal fusion;
D O I
10.1007/978-981-97-8620-6_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation segments target objects in the image according to language expressions. Existing methods mainly make efforts to integrate multi-modal features with attention mechanisms. However, most methods tend to incline to the feature of a single modal during the fusion stage and fall short in exploring cross-modal contextual information, which is critical in localizing accurate target regions. To this end, we propose a novel architecture named Dual Context Perception Transformer (DCPformer) which considers both visual and linguistic contextual information during the fusion and reasoning stages. Specifically, a Cross-modal Context-aware Perception Module (CCPM) is designed to model cross-modal alignment in a unified visual-linguistic representation space. Furthermore, we propose an Information Feedback Module (IFM) that generates a rectification mask based on deep-scale features and filters unrelated signals of the target object in features of shallower scales. Extensive experiments show that the proposed DCP-former achieves state-of-the-art performances against existing methods on three challenging benchmarks.
引用
收藏
页码:216 / 230
页数:15
相关论文
共 50 条
  • [1] Contrastive Grouping with Transformer for Referring Image Segmentation
    Tang, Jiajin
    Zheng, Ge
    Shi, Cheng
    Yang, Sibei
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 23570 - 23580
  • [2] CARIS: Context-Aware Referring Image Segmentation
    Liu, Sun-Ao
    Zhang, Yiheng
    Qiu, Zhaofan
    Xie, Hongtao
    Zhang, Yongdong
    Yao, Ting
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 779 - 788
  • [3] De-noising mask transformer for referring image segmentation
    Wang, Yehui
    Lei, Fang
    Wang, Baoyan
    Zhang, Qiang
    Zhen, Xiantong
    Zhang, Lei
    IMAGE AND VISION COMPUTING, 2025, 154
  • [4] A CONTEXT-BASED NETWORK FOR REFERRING IMAGE SEGMENTATION
    Li, Xinyu
    Liu, Yu
    Xu, Kaiping
    Zhao, Zhehuan
    Liu, Sipei
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1436 - 1440
  • [5] Dual Convolutional LSTM Network for Referring Image Segmentation
    Ye, Linwei
    Liu, Zhi
    Wang, Yang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (12) : 3224 - 3235
  • [6] Global and Local Interactive Perception Network for Referring Image Segmentation
    Liu, Jing
    Tan, Hongchen
    Hu, Yongli
    Sun, Yanfeng
    Wang, Huasheng
    Yin, Baocai
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 35 (12) : 1 - 14
  • [7] LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
    Yang, Zhao
    Wang, Jiaqi
    Tang, Yansong
    Chen, Kai
    Zhao, Hengshuang
    Torr, Philip H. S.
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18134 - 18144
  • [8] Cross-modal transformer with language query for referring image segmentation
    Zhang, Wenjing
    Tan, Quange
    Li, Pengxin
    Zhang, Qi
    Wang, Rong
    NEUROCOMPUTING, 2023, 536 : 191 - 205
  • [9] DPCTN: Dual path context-aware transformer network for medical image segmentation
    Song, Pengfei
    Yang, Zhe
    Li, Jinjiang
    Fan, Hui
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 124
  • [10] Dual-graph hierarchical interaction network for referring image segmentation
    Shi, Zhaofeng
    Wu, Qingbo
    Li, Hongliang
    Meng, Fanman
    Ngan, King Ngi
    DISPLAYS, 2023, 80