Bottom-Up Shift and Reasoning for Referring Image Segmentation

被引:58
|
作者
Yang, Sibei [1 ]
Xia, Meng [2 ]
Li, Guanbin [2 ]
Zhou, Hong-Yu [3 ]
Yu, Yizhou [3 ,4 ]
机构
[1] ShanghaiTech Univ, Shanghai, Peoples R China
[2] Sun Yat Sen Univ, Guangzhou, Peoples R China
[3] Univ Hong Kong, Hong Kong, Peoples R China
[4] Deepwise AI Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR46437.2021.01111
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Referring image segmentation aims to segment the referent that is the corresponding object or stuff referred by a natural language expression in an image. Its main challenge lies in how to effectively and efficiently differentiate between the referent and other objects of the same category as the referent. In this paper, we tackle the challenge by jointly performing compositional visual reasoning and accurate segmentation in a single stage via the proposed novel Bottom-Up Shift (BUS) and Bidirectional Attentive Refinement (BIAR) modules. Specifically, BUS progressively locates the referent along hierarchical reasoning steps implied by the expression. At each step, it locates the corresponding visual region by disambiguating between similar regions, where the disambiguation bases on the relationships between regions. By the explainable visual reasoning, BUS explicitly aligns linguistic components with visual regions so that it can identify all the mentioned entities in the expression. BIAR fuses multi-level features via a twoway attentive message passing, which captures the visual details relevant to the referent to refine segmentation results. Experimental results demonstrate that the proposed method consisting of BUS and BIAR modules, can not only consistently surpass all existing state-of-the-art algorithms across common benchmark datasets but also visualize interpretable reasoning steps for stepwise segmentation. Code is available at https://github.com/incredibleXM/BUSNet.
引用
收藏
页码:11261 / 11270
页数:10
相关论文
共 50 条
  • [1] Bottom-up segmentation of image sequences for coding
    Marcotegui, B
    Meyer, F
    ANNALES DES TELECOMMUNICATIONS-ANNALS OF TELECOMMUNICATIONS, 1997, 52 (7-8): : 397 - 407
  • [2] WiCo: Win-win Cooperation of Bottom-up and Top-down Referring Image Segmentation
    Cheng, Zesen
    Jin, Peng
    Li, Hao
    Li, Kehan
    Li, Siheng
    Ji, Xiangyang
    Liu, Chang
    Chen, Jie
    PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023, 2023, : 636 - 644
  • [3] Is visual image segmentation a bottom-up or an interactive process?
    Vecera, SP
    Farah, MJ
    PERCEPTION & PSYCHOPHYSICS, 1997, 59 (08): : 1280 - 1296
  • [4] Is visual image segmentation a bottom-up or an interactive process?
    Shaun P. Vecera
    Martha J. Farah
    Perception & Psychophysics, 1997, 59 : 1280 - 1296
  • [5] Image segmentation by probabilistic bottom-up aggregation and cue integration
    Alpert, Sharon
    Galun, Meirav
    Basri, Ronen
    Brandt, Achi
    2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 359 - +
  • [6] Image Segmentation by Probabilistic Bottom-Up Aggregation and Cue Integration
    Alpert, Sharon
    Galun, Meirav
    Brandt, Achi
    Basri, Ronen
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (02) : 315 - 327
  • [7] Bottom-Up and Bidirectional Alignment for Referring Expression Comprehension
    Li, Liuwu
    Bu, Yuqi
    Cai, Yi
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5167 - 5175
  • [8] Vision-Aware Language Reasoning for Referring Image Segmentation
    Xu, Fayou
    Luo, Bing
    Zhang, Chao
    Xu, Li
    Pu, Mingxing
    Li, Bo
    NEURAL PROCESSING LETTERS, 2023, 55 (08) : 11313 - 11331
  • [9] Vision-Aware Language Reasoning for Referring Image Segmentation
    Fayou Xu
    Bing Luo
    Chao Zhang
    Li Xu
    Mingxing Pu
    Bo Li
    Neural Processing Letters, 2023, 55 : 11313 - 11331
  • [10] Beyond pixels: A comprehensive survey from bottom-up to semantic image segmentation and cosegmentation
    Zhu, Hongyuan
    Meng, Fanman
    Cai, Jianfei
    Lu, Shijian
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2016, 34 : 12 - 27