Bottom-Up Shift and Reasoning for Referring Image Segmentation

被引：58

作者：

Yang, Sibei ^{[1
]}

Xia, Meng ^{[2
]}

Li, Guanbin ^{[2
]}

Zhou, Hong-Yu ^{[3
]}

Yu, Yizhou ^{[3
,4
]}

机构：

[1] ShanghaiTech Univ, Shanghai, Peoples R China

[2] Sun Yat Sen Univ, Guangzhou, Peoples R China

[3] Univ Hong Kong, Hong Kong, Peoples R China

[4] Deepwise AI Lab, Beijing, Peoples R China

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR46437.2021.01111

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Referring image segmentation aims to segment the referent that is the corresponding object or stuff referred by a natural language expression in an image. Its main challenge lies in how to effectively and efficiently differentiate between the referent and other objects of the same category as the referent. In this paper, we tackle the challenge by jointly performing compositional visual reasoning and accurate segmentation in a single stage via the proposed novel Bottom-Up Shift (BUS) and Bidirectional Attentive Refinement (BIAR) modules. Specifically, BUS progressively locates the referent along hierarchical reasoning steps implied by the expression. At each step, it locates the corresponding visual region by disambiguating between similar regions, where the disambiguation bases on the relationships between regions. By the explainable visual reasoning, BUS explicitly aligns linguistic components with visual regions so that it can identify all the mentioned entities in the expression. BIAR fuses multi-level features via a twoway attentive message passing, which captures the visual details relevant to the referent to refine segmentation results. Experimental results demonstrate that the proposed method consisting of BUS and BIAR modules, can not only consistently surpass all existing state-of-the-art algorithms across common benchmark datasets but also visualize interpretable reasoning steps for stepwise segmentation. Code is available at https://github.com/incredibleXM/BUSNet.

引用

页码：11261 / 11270

页数：10

共 50 条

[41] Bottom-up communication
Milani, Myrna
CANADIAN VETERINARY JOURNAL-REVUE VETERINAIRE CANADIENNE, 2010, 51 (10): : 1163 - 1164
[42] Bottom-up economics
不详
HARVARD BUSINESS REVIEW, 2003, 81 (08) : 18 - +
[43] Bottom-up nanoelectronics
Hadley, P
34TH EUROPEAN MICROWAVE CONFERENCE, VOLS 1-3, CONFERENCE PROCEEDINGS, 2004, : 141 - 145
[44] Bottom-up improved multistage temporal convolutional network for action segmentation
Chen, Wenhe
Chai, Yuan
Qi, Miao
Sun, Hui
Pu, Qi
Kong, Jun
Zheng, Caixia
APPLIED INTELLIGENCE, 2022, 52 (12) : 14053 - 14069
[45] Automated Urban Travel Interpretation: A Bottom-up Approach for Trajectory Segmentation
Das, Rahul Deb
Winter, Stephan
SENSORS, 2016, 16 (11)
[46] MULTIPROCESSOR PYRAMID ARCHITECTURES FOR BOTTOM-UP IMAGE ANALYSIS.
Ahuja, Narendra
Swamy, Sowmitri
Test & measurement world, 1985, 5 (10) : 66 - 76
[47] Bottom-up Conservation
Sodhi, Navjot S.
Butler, Rhett
Raven, Peter H.
BIOTROPICA, 2011, 43 (05) : 521 - 523
[48] Bottom-Up Management
Freeman, Ruth
PERSONNEL PSYCHOLOGY, 1950, 3 (02) : 236 - 237
[49] BOTTOM-UP DDP
YASAKI, EK
DATAMATION, 1983, 29 (04): : 131 - 132
[50] Bottom-up improved multistage temporal convolutional network for action segmentation
Wenhe Chen
Yuan Chai
Miao Qi
Hui Sun
Qi Pu
Jun Kong
Caixia Zheng
Applied Intelligence, 2022, 52 : 14053 - 14069

← 1 2 3 4 5 →