Dual Context Perception Transformer for Referring Image Segmentation

被引:0
|
作者
Kong, Yuqiu [1 ]
Liu, Junhua [1 ]
Yao, Cuili [1 ]
机构
[1] Dalian Univ Technol, Dalian 116024, Peoples R China
基金
中国国家自然科学基金;
关键词
Referring image segmentation; Vision-linguistic alignment; Multi-modal fusion;
D O I
10.1007/978-981-97-8620-6_15
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation segments target objects in the image according to language expressions. Existing methods mainly make efforts to integrate multi-modal features with attention mechanisms. However, most methods tend to incline to the feature of a single modal during the fusion stage and fall short in exploring cross-modal contextual information, which is critical in localizing accurate target regions. To this end, we propose a novel architecture named Dual Context Perception Transformer (DCPformer) which considers both visual and linguistic contextual information during the fusion and reasoning stages. Specifically, a Cross-modal Context-aware Perception Module (CCPM) is designed to model cross-modal alignment in a unified visual-linguistic representation space. Furthermore, we propose an Information Feedback Module (IFM) that generates a rectification mask based on deep-scale features and filters unrelated signals of the target object in features of shallower scales. Extensive experiments show that the proposed DCP-former achieves state-of-the-art performances against existing methods on three challenging benchmarks.
引用
收藏
页码:216 / 230
页数:15
相关论文
共 50 条
  • [21] CPFTransformer: transformer fusion context pyramid medical image segmentation network
    Li, Jiao
    Ye, Jinyu
    Zhang, Ruixin
    Wu, Yue
    Berhane, Gebremedhin Samuel
    Deng, Hongxia
    Shi, Hong
    FRONTIERS IN NEUROSCIENCE, 2023, 17
  • [22] DuAT: Dual-Aggregation Transformer Network for Medical Image Segmentation
    Tang, Feilong
    Xu, Zhongxing
    Huang, Qiming
    Wang, Jinfeng
    Hou, Xianxu
    Su, Jionglong
    Liu, Jingxin
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT V, 2024, 14429 : 343 - 356
  • [23] DTBNet: Medical image segmentation model based on dual transformer bridge
    Wang, Yuli (wyl@qlu.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc.
  • [24] RRSIS: Referring Remote Sensing Image Segmentation
    Yuan, Zhenghang
    Mou, Lichao
    Hua, Yuansheng
    Zhu, Xiao Xiang
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
  • [25] Referring Image Segmentation Using Text Supervision
    Liu, Fang
    Liu, Yuhao
    Kong, Yuqiu
    Xu, Ke
    Zhang, Lihe
    Yin, Baocai
    Hancke, Gerhard
    Lau, Rynson
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22067 - 22077
  • [26] Image Segmentation With Language Referring Expression and Comprehension
    Sun, Jiaxing
    Li, Yujie
    Cai, Jintong
    Lu, Huimin
    Serikawa, Seiichi
    IEEE SENSORS JOURNAL, 2022, 22 (18) : 17406 - 17413
  • [27] Distillation and Supplementation of Features for Referring Image Segmentation
    Tan, Zeyu
    Xu, Dahong
    Li, Xi
    Liu, Hong
    IEEE ACCESS, 2024, 12 : 171269 - 171279
  • [28] Recurrent Multimodal Interaction for Referring Image Segmentation
    Liu, Chenxi
    Lin, Zhe
    Shen, Xiaohui
    Yang, Jimei
    Lu, Xin
    Yuille, Alan
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1280 - 1289
  • [29] Referring Image Segmentation Without Text Annotations
    Liu, Jing
    Jiang, Huajie
    Bi, Yandong
    Hu, Yongli
    Yin, Baocai
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 278 - 293
  • [30] ReMamber: Referring Image Segmentation with Mamba Twister
    Yang, Yuhuan
    Ma, Chaofan
    Yao, Jiangchao
    Zhong, Zhun
    Zhang, Ya
    Wang, Yanfeng
    COMPUTER VISION - ECCV 2024, PT X, 2025, 15068 : 108 - 126