A closer look at referring expressions for video object segmentation

被引:6
|
作者
Bellver, Miriam [1 ]
Ventura, Carles [2 ]
Silberer, Carina [3 ]
Kazakos, Ioannis [4 ]
Torres, Jordi [1 ]
Giro-i-Nieto, Xavier [5 ,6 ]
机构
[1] Barcelona Supercomp Ctr BSC, Barcelona, Spain
[2] Univ Oberta Catalunya UOC, Barcelona, Spain
[3] Univ Stuttgart, Inst NLP, Stuttgart, Germany
[4] Natl Tech Univ Athens, Athens, Greece
[5] Univ Politecn Catalunya UPC, Barcelona, Catalonia, Spain
[6] CSIC UPC, Inst Robot & Informat Ind, Barcelona, Catalonia, Spain
关键词
Referring expressions; Video object segmentation; Vision and language;
D O I
10.1007/s11042-022-13413-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The task of Language-guided Video Object Segmentation (LVOS) aims at generating binary masks for an object referred by a linguistic expression. When this expression unambiguously describes an object in the scene, it is named referring expression (RE). Our work argues that existing benchmarks used for LVOS are mainly composed of trivial cases, in which referents can be identified with simple phrases. Our analysis relies on a new categorization of the referring expressions in the DAVIS-2017 and Actor-Action datasets into trivial and non-trivial REs, where the non-trivial REs are further annotated with seven RE semantic categories. We leverage these data to analyze the performance of RefVOS, a novel neural network that obtains competitive results for the task of language-guided image segmentation and state of the art results for LVOS. Our study indicates that the major challenges for the task are related to understanding motion and static actions.
引用
收藏
页码:4419 / 4438
页数:20
相关论文
共 50 条
  • [11] OnlineRefer: A Simple Online Baseline for Referring Video Object Segmentation
    Wu, Dongming
    Wang, Tiancai
    Zhang, Yuang
    Zhang, Xiangyu
    Shen, Jianbing
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2749 - 2758
  • [12] Robust Referring Video Object Segmentation with Cyclic Structural Consensus
    Li, Xiang
    Wang, Jinglu
    Xu, Xiaohao
    Li, Xiao
    Raj, Bhiksha
    Lu, Yan
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 22179 - 22188
  • [13] Temporally Consistent Referring Video Object Segmentation With Hybrid Memory
    Miao, Bo
    Bennamoun, Mohammed
    Gao, Yongsheng
    Shah, Mubarak
    Mian, Ajmal
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (11) : 11373 - 11385
  • [14] Weakly supervised video object segmentation initialized with referring expression
    Bu, Xiaoqing
    Sun, Yukuan
    Wang, Jianming
    Liu, Kunliang
    Liang, Jiayu
    Jin, Guanghao
    Chung, Tae-Sun
    NEUROCOMPUTING, 2021, 453 : 754 - 765
  • [15] SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
    Luo, Zhuoyan
    Xiao, Yicheng
    Liu, Yong
    Li, Shuyan
    Wang, Yitong
    Tang, Yansong
    Li, Xiu
    Yang, Yujiu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [16] Expression Prompt Collaboration Transformer for universal referring video object segmentation
    Chen, Jiajun
    Lin, Jiacheng
    Zhong, Guojin
    Fu, Haolong
    Nai, Ke
    Yang, Kailun
    Li, Zhiyong
    KNOWLEDGE-BASED SYSTEMS, 2025, 311
  • [17] Decoupled Cross-Modal Transformer for Referring Video Object Segmentation
    Wu, Ao
    Wang, Rong
    Tan, Quange
    Song, Zhenfeng
    SENSORS, 2024, 24 (16)
  • [18] Multi-Attention Network for Compressed Video Referring Object Segmentation
    Chen, Weidong
    Hong, Dexiang
    Qi, Yuankai
    Han, Zhenjun
    Wang, Shuhui
    Qing, Laiyun
    Huang, Qingming
    Li, Guorong
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 4416 - 4425
  • [19] End-to-End Referring Video Object Segmentation with Multimodal Transformers
    Botach, Adam
    Zheltonozhskii, Evgenii
    Baskin, Chaim
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 4975 - 4985
  • [20] Bundled Object Context for Referring Expressions
    Li, Xiangyang
    Jiang, Shuqiang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (10) : 2749 - 2760