A closer look at referring expressions for video object segmentation

被引:6
|
作者
Bellver, Miriam [1 ]
Ventura, Carles [2 ]
Silberer, Carina [3 ]
Kazakos, Ioannis [4 ]
Torres, Jordi [1 ]
Giro-i-Nieto, Xavier [5 ,6 ]
机构
[1] Barcelona Supercomp Ctr BSC, Barcelona, Spain
[2] Univ Oberta Catalunya UOC, Barcelona, Spain
[3] Univ Stuttgart, Inst NLP, Stuttgart, Germany
[4] Natl Tech Univ Athens, Athens, Greece
[5] Univ Politecn Catalunya UPC, Barcelona, Catalonia, Spain
[6] CSIC UPC, Inst Robot & Informat Ind, Barcelona, Catalonia, Spain
关键词
Referring expressions; Video object segmentation; Vision and language;
D O I
10.1007/s11042-022-13413-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The task of Language-guided Video Object Segmentation (LVOS) aims at generating binary masks for an object referred by a linguistic expression. When this expression unambiguously describes an object in the scene, it is named referring expression (RE). Our work argues that existing benchmarks used for LVOS are mainly composed of trivial cases, in which referents can be identified with simple phrases. Our analysis relies on a new categorization of the referring expressions in the DAVIS-2017 and Actor-Action datasets into trivial and non-trivial REs, where the non-trivial REs are further annotated with seven RE semantic categories. We leverage these data to analyze the performance of RefVOS, a novel neural network that obtains competitive results for the task of language-guided image segmentation and state of the art results for LVOS. Our study indicates that the major challenges for the task are related to understanding motion and static actions.
引用
收藏
页码:4419 / 4438
页数:20
相关论文
共 50 条
  • [1] A closer look at referring expressions for video object segmentation
    Miriam Bellver
    Carles Ventura
    Carina Silberer
    Ioannis Kazakos
    Jordi Torres
    Xavier Giro-i-Nieto
    Multimedia Tools and Applications, 2023, 82 : 4419 - 4438
  • [2] Video Object Segmentation with Referring Expressions
    Khoreva, Anna
    Rohrbach, Anna
    Schiele, Bernt
    COMPUTER VISION - ECCV 2018 WORKSHOPS, PT IV, 2019, 11132 : 7 - 12
  • [3] Video Object Segmentation with Language Referring Expressions
    Khoreva, Anna
    Rohrbach, Anna
    Schiele, Bernt
    COMPUTER VISION - ACCV 2018, PT IV, 2019, 11364 : 123 - 141
  • [4] Methods for Referring Video Object Segmentation
    Wei, Caiying
    Jia, Lei
    Computer Engineering and Applications, 61 (02): : 73 - 83
  • [5] Language as Queries for Referring Video Object Segmentation
    Wu, Jiannan
    Jiang, Yi
    Sun, Peize
    Yuan, Zehuan
    Luo, Ping
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4964 - 4974
  • [6] Object-Agnostic Transformers for Video Referring Segmentation
    Yang, Xu
    Wang, Hao
    Xie, De
    Deng, Cheng
    Tao, Dacheng
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 2839 - 2849
  • [7] Temporal Collection and Distribution for Referring Video Object Segmentation
    Tang, Jiajin
    Zheng, Ge
    Yang, Sibei
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15420 - 15430
  • [8] Decoupling Multimodal Transformers for Referring Video Object Segmentation
    Gao, Mingqi
    Yang, Jinyu
    Han, Jungong
    Lu, Ke
    Zheng, Feng
    Montana, Giovanni
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4518 - 4528
  • [9] Temporal Context Enhanced Referring Video Object Segmentation
    Hu, Xiao
    Hampiholi, Basavaraj
    Neumann, Heiko
    Lang, Jochen
    2024 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION, WACV 2024, 2024, : 5562 - 5571
  • [10] MRRVOS: Modular Refinement Referring Video Object Segmentation
    Duan, Zhijiang
    Sun, Yukuan
    Wang, Jianming
    WEB AND BIG DATA, 2021, 1505 : 117 - 128