Prompt-guided bidirectional deep fusion network for referring image segmentation

被引：0

作者：

Wu, Junxian ^{[1
,2
]}

Zhang, Yujia ^{[1
]}

Kampffmeyer, Michael ^{[3
]}

Zhao, Xiaoguang ^{[1
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China

[3] UiT Arctic Univ Norway, Dept Phys & Technol, Tromso, Norway

来源：

NEUROCOMPUTING | 2025年 / 616卷

基金：

中国国家自然科学基金;

关键词：

Referring image segmentation; Prompt-guided bidirectional encoder fusion; Prompt-guided cross-modal interaction;

D O I：

10.1016/j.neucom.2024.128899

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring image segmentation involves accurately segmenting objects based on natural language descriptions. This poses challenges due to the intricate and varied nature of language expressions, as well as the requirement to identify relevant image regions among multiple objects. Current models predominantly employ language- aware early fusion techniques, which may lead to misinterpretations of language expressions due to the lack of explicit visual guidance of the language encoder. Additionally, early fusion methods are unable to adequately leverage high-level contexts. To address these limitations, this paper introduces the Prompt-guided Bidirectional Deep Fusion Network (PBDF-Net) to enhance the fusion of language and vision modalities. In contrast to traditional unidirectional early fusion approaches, our approach employs a prompt-guided bidirectional encoder fusion (PBEF) module to promote mutual cross-modal fusion across multiple stages of the vision and language encoders. Furthermore, PBDF-Net incorporates a prompt-guided cross-modal interaction (PCI) module during the late fusion stage, facilitating amore profound integration of contextual information from both modalities, resulting in more accurate target segmentation. Comprehensive experiments conducted on the RefCOCO, RefCOCO+, G-Ref and ReferIt datasets substantiate the efficacy of our proposed method, demonstrating significant advancements in performance compared to existing approaches.

引用

页数：12

共 50 条

[1] Multiscale deep feature selection fusion network for referring image segmentation
Xianwen Dai
Jiacheng Lin
Ke Nai
Qingpeng Li
Zhiyong Li
Multimedia Tools and Applications, 2024, 83 : 36287 - 36305
[2] Multiscale deep feature selection fusion network for referring image segmentation
Dai, Xianwen
Lin, Jiacheng
Nai, Ke
Li, Qingpeng
Li, Zhiyong
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (12) : 36287 - 36305
[3] Bidirectional Relationship Inferring Network for Referring Image Localization and Segmentation
Feng, Guang
Hu, Zhiwei
Zhang, Lihe
Sun, Jiayu
Lu, Huchuan
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (05) : 2246 - 2258
[4] Structured Multimodal Fusion Network for Referring Image Segmentation
Xue, Mingcheng
Liu, Yu
Xu, Kaiping
Zhang, Haiyang
Yu, Chengyang
PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2022, 2022, : 36 - 47
[5] DCMFNet: Deep Cross-Modal Fusion Network for Referring Image Segmentation with Iterative Gated Fusion
Huang, Zhen
Xue, Mingcheng
Liu, Yu
Xu, Kaiping
Li, Jiangquan
Yu, Chenyang
PROCEEDINGS OF THE 50TH GRAPHICS INTERFACE CONFERENCE, GI 2024, 2024,
[6] Prompt-Guided Sparse Transformer for Remote Sensing Image Dehazing
Dong, Haobo
Song, Tianyu
Qi, Xuanyu
Jin, Guiyue
Jin, Jiyu
Ma, Ling
IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
[7] PROMPTCAP: Prompt-Guided Image Captioning for VQA with GPT-3
Hu, Yushi
Hua, Hang
Yang, Zhengyuan
Shi, Weijia
Smith, Noah A.
Luo, Jiebo
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2951 - 2963
[8] MVPN: Multi-granularity visual prompt-guided fusion network for multimodal named entity recognition
Liu, Wei
Ren, Aiqun
Wang, Chao
Peng, Yan
Xie, Shaorong
Li, Weimin
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (28) : 71639 - 71663
[9] Low-Rank Prompt-Guided Transformer for Hyperspectral Image Denoising
Tan, Xiaodong
Shao, Mingwen
Qiao, Yuanjian
Liu, Tiyao
Cao, Xiangyong
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[10] Prompt-Guided Semantic-Aware Distillation for Weakly Supervised Incremental Semantic Segmentation
Hao, Xuze
Jiang, Xuhao
Ni, Wenqian
Tan, Weimin
Yan, Bo
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 10632 - 10645

← 1 2 3 4 5 →