Dual Context Perception Transformer for Referring Image Segmentation

被引：0

作者：

Kong, Yuqiu ^{[1
]}

Liu, Junhua ^{[1
]}

Yao, Cuili ^{[1
]}

机构：

[1] Dalian Univ Technol, Dalian 116024, Peoples R China

来源：

PATTERN RECOGNITION AND COMPUTER VISION, PT V, PRCV 2024 | 2025年 / 15035卷

基金：

中国国家自然科学基金;

关键词：

Referring image segmentation; Vision-linguistic alignment; Multi-modal fusion;

D O I：

10.1007/978-981-97-8620-6_15

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring image segmentation segments target objects in the image according to language expressions. Existing methods mainly make efforts to integrate multi-modal features with attention mechanisms. However, most methods tend to incline to the feature of a single modal during the fusion stage and fall short in exploring cross-modal contextual information, which is critical in localizing accurate target regions. To this end, we propose a novel architecture named Dual Context Perception Transformer (DCPformer) which considers both visual and linguistic contextual information during the fusion and reasoning stages. Specifically, a Cross-modal Context-aware Perception Module (CCPM) is designed to model cross-modal alignment in a unified visual-linguistic representation space. Furthermore, we propose an Information Feedback Module (IFM) that generates a rectification mask based on deep-scale features and filters unrelated signals of the target object in features of shallower scales. Extensive experiments show that the proposed DCP-former achieves state-of-the-art performances against existing methods on three challenging benchmarks.

引用

页码：216 / 230

页数：15

共 50 条

[21] CPFTransformer: transformer fusion context pyramid medical image segmentation network
Li, Jiao
Ye, Jinyu
Zhang, Ruixin
Wu, Yue
Berhane, Gebremedhin Samuel
Deng, Hongxia
Shi, Hong
FRONTIERS IN NEUROSCIENCE, 2023, 17
[22] DuAT: Dual-Aggregation Transformer Network for Medical Image Segmentation
Tang, Feilong
Xu, Zhongxing
Huang, Qiming
Wang, Jinfeng
Hou, Xianxu
Su, Jionglong
Liu, Jingxin
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT V, 2024, 14429 : 343 - 356
[23] DTBNet: Medical image segmentation model based on dual transformer bridge
Wang, Yuli (wyl@qlu.edu.cn), 1600, Institute of Electrical and Electronics Engineers Inc.
[24] RRSIS: Referring Remote Sensing Image Segmentation
Yuan, Zhenghang
Mou, Lichao
Hua, Yuansheng
Zhu, Xiao Xiang
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 12
[25] Referring Image Segmentation Using Text Supervision
Liu, Fang
Liu, Yuhao
Kong, Yuqiu
Xu, Ke
Zhang, Lihe
Yin, Baocai
Hancke, Gerhard
Lau, Rynson
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22067 - 22077
[26] Image Segmentation With Language Referring Expression and Comprehension
Sun, Jiaxing
Li, Yujie
Cai, Jintong
Lu, Huimin
Serikawa, Seiichi
IEEE SENSORS JOURNAL, 2022, 22 (18) : 17406 - 17413
[27] Distillation and Supplementation of Features for Referring Image Segmentation
Tan, Zeyu
Xu, Dahong
Li, Xi
Liu, Hong
IEEE ACCESS, 2024, 12 : 171269 - 171279
[28] Recurrent Multimodal Interaction for Referring Image Segmentation
Liu, Chenxi
Lin, Zhe
Shen, Xiaohui
Yang, Jimei
Lu, Xin
Yuille, Alan
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1280 - 1289
[29] Referring Image Segmentation Without Text Annotations
Liu, Jing
Jiang, Huajie
Bi, Yandong
Hu, Yongli
Yin, Baocai
ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT XII, ICIC 2024, 2024, 14873 : 278 - 293
[30] ReMamber: Referring Image Segmentation with Mamba Twister
Yang, Yuhuan
Ma, Chaofan
Yao, Jiangchao
Zhong, Zhun
Zhang, Ya
Wang, Yanfeng
COMPUTER VISION - ECCV 2024, PT X, 2025, 15068 : 108 - 126

← 1 2 3 4 5 →