Bottom-Up Shift and Reasoning for Referring Image Segmentation

被引:58
|
作者
Yang, Sibei [1 ]
Xia, Meng [2 ]
Li, Guanbin [2 ]
Zhou, Hong-Yu [3 ]
Yu, Yizhou [3 ,4 ]
机构
[1] ShanghaiTech Univ, Shanghai, Peoples R China
[2] Sun Yat Sen Univ, Guangzhou, Peoples R China
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] Deepwise AI Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR46437.2021.01111
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation aims to segment the referent that is the corresponding object or stuff referred by a natural language expression in an image. Its main challenge lies in how to effectively and efficiently differentiate between the referent and other objects of the same category as the referent. In this paper, we tackle the challenge by jointly performing compositional visual reasoning and accurate segmentation in a single stage via the proposed novel Bottom-Up Shift (BUS) and Bidirectional Attentive Refinement (BIAR) modules. Specifically, BUS progressively locates the referent along hierarchical reasoning steps implied by the expression. At each step, it locates the corresponding visual region by disambiguating between similar regions, where the disambiguation bases on the relationships between regions. By the explainable visual reasoning, BUS explicitly aligns linguistic components with visual regions so that it can identify all the mentioned entities in the expression. BIAR fuses multi-level features via a twoway attentive message passing, which captures the visual details relevant to the referent to refine segmentation results. Experimental results demonstrate that the proposed method consisting of BUS and BIAR modules, can not only consistently surpass all existing state-of-the-art algorithms across common benchmark datasets but also visualize interpretable reasoning steps for stepwise segmentation. Code is available at https://github.com/incredibleXM/BUSNet.
引用
收藏
页码:11261 / 11270
页数:10
相关论文
共 50 条
  • [21] Bottom-up document segmentation method based on textural features
    Vil'kin A.M.
    Safonov I.V.
    Egorova M.A.
    Pattern Recognition and Image Analysis, 2011, 21 (3) : 565 - 568
  • [22] Learning to Combine Bottom-Up and Top-Down Segmentation
    Anat Levin
    Yair Weiss
    International Journal of Computer Vision, 2009, 81 : 105 - 118
  • [23] Top-down and bottom-up image processing
    Stark, LW
    Privitera, C
    1997 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, 1997, : 2294 - 2299
  • [24] Cross-modal attention guided visual reasoning for referring image segmentation
    Zhang, Wenjing
    Hu, Mengnan
    Tan, Quange
    Zhou, Qianli
    Wang, Rong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (19) : 28853 - 28872
  • [25] CMIRNet: Cross-Modal Interactive Reasoning Network for Referring Image Segmentation
    Xu, Mingzhu
    Xiao, Tianxiang
    Liu, Yutong
    Tang, Haoyu
    Hu, Yupeng
    Nie, Liqiang
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2025, 35 (04) : 3234 - 3249
  • [26] Cross-modal attention guided visual reasoning for referring image segmentation
    Wenjing Zhang
    Mengnan Hu
    Quange Tan
    Qianli Zhou
    Rong Wang
    Multimedia Tools and Applications, 2023, 82 : 28853 - 28872
  • [27] Bottom-Up and Top-Down Reasoning with Hierarchical Rectified Gaussians
    Hu, Peiyun
    Ramanan, Deva
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 5600 - 5609
  • [28] INCORPORATING TOP-DOWN INFORMATION INTO BOTTOM-UP HYPOTHETICAL REASONING
    OHTA, Y
    INOUE, K
    NEW GENERATION COMPUTING, 1993, 11 (3-4) : 401 - 421
  • [29] Close the Loop: A Unified Bottom-Up and Top-Down Paradigm for Joint Image Deraining and Segmentation
    Li, Yi
    Chang, Yi
    Yu, Changfeng
    Yan, Luxin
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1438 - 1446
  • [30] Bottom-up excitonics
    Aspuru-Guzik, Alan
    ABSTRACTS OF PAPERS OF THE AMERICAN CHEMICAL SOCIETY, 2016, 251