Bottom-Up Shift and Reasoning for Referring Image Segmentation

被引:58
|
作者
Yang, Sibei [1 ]
Xia, Meng [2 ]
Li, Guanbin [2 ]
Zhou, Hong-Yu [3 ]
Yu, Yizhou [3 ,4 ]
机构
[1] ShanghaiTech Univ, Shanghai, Peoples R China
[2] Sun Yat Sen Univ, Guangzhou, Peoples R China
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] Deepwise AI Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR46437.2021.01111
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation aims to segment the referent that is the corresponding object or stuff referred by a natural language expression in an image. Its main challenge lies in how to effectively and efficiently differentiate between the referent and other objects of the same category as the referent. In this paper, we tackle the challenge by jointly performing compositional visual reasoning and accurate segmentation in a single stage via the proposed novel Bottom-Up Shift (BUS) and Bidirectional Attentive Refinement (BIAR) modules. Specifically, BUS progressively locates the referent along hierarchical reasoning steps implied by the expression. At each step, it locates the corresponding visual region by disambiguating between similar regions, where the disambiguation bases on the relationships between regions. By the explainable visual reasoning, BUS explicitly aligns linguistic components with visual regions so that it can identify all the mentioned entities in the expression. BIAR fuses multi-level features via a twoway attentive message passing, which captures the visual details relevant to the referent to refine segmentation results. Experimental results demonstrate that the proposed method consisting of BUS and BIAR modules, can not only consistently surpass all existing state-of-the-art algorithms across common benchmark datasets but also visualize interpretable reasoning steps for stepwise segmentation. Code is available at https://github.com/incredibleXM/BUSNet.
引用
收藏
页码:11261 / 11270
页数:10
相关论文
共 50 条
  • [41] Bottom-up communication
    Milani, Myrna
    CANADIAN VETERINARY JOURNAL-REVUE VETERINAIRE CANADIENNE, 2010, 51 (10): : 1163 - 1164
  • [42] Bottom-up economics
    不详
    HARVARD BUSINESS REVIEW, 2003, 81 (08) : 18 - +
  • [43] Bottom-up nanoelectronics
    Hadley, P
    34TH EUROPEAN MICROWAVE CONFERENCE, VOLS 1-3, CONFERENCE PROCEEDINGS, 2004, : 141 - 145
  • [44] Bottom-up improved multistage temporal convolutional network for action segmentation
    Chen, Wenhe
    Chai, Yuan
    Qi, Miao
    Sun, Hui
    Pu, Qi
    Kong, Jun
    Zheng, Caixia
    APPLIED INTELLIGENCE, 2022, 52 (12) : 14053 - 14069
  • [45] Automated Urban Travel Interpretation: A Bottom-up Approach for Trajectory Segmentation
    Das, Rahul Deb
    Winter, Stephan
    SENSORS, 2016, 16 (11)
  • [46] MULTIPROCESSOR PYRAMID ARCHITECTURES FOR BOTTOM-UP IMAGE ANALYSIS.
    Ahuja, Narendra
    Swamy, Sowmitri
    Test & measurement world, 1985, 5 (10) : 66 - 76
  • [47] Bottom-up Conservation
    Sodhi, Navjot S.
    Butler, Rhett
    Raven, Peter H.
    BIOTROPICA, 2011, 43 (05) : 521 - 523
  • [48] Bottom-Up Management
    Freeman, Ruth
    PERSONNEL PSYCHOLOGY, 1950, 3 (02) : 236 - 237
  • [49] BOTTOM-UP DDP
    YASAKI, EK
    DATAMATION, 1983, 29 (04): : 131 - 132
  • [50] Bottom-up improved multistage temporal convolutional network for action segmentation
    Wenhe Chen
    Yuan Chai
    Miao Qi
    Hui Sun
    Qi Pu
    Jun Kong
    Caixia Zheng
    Applied Intelligence, 2022, 52 : 14053 - 14069