A closer look at referring expressions for video object segmentation

被引:6
|
作者
Bellver, Miriam [1 ]
Ventura, Carles [2 ]
Silberer, Carina [3 ]
Kazakos, Ioannis [4 ]
Torres, Jordi [1 ]
Giro-i-Nieto, Xavier [5 ,6 ]
机构
[1] Barcelona Supercomp Ctr BSC, Barcelona, Spain
[2] Univ Oberta Catalunya UOC, Barcelona, Spain
[3] Univ Stuttgart, Inst NLP, Stuttgart, Germany
[4] Natl Tech Univ Athens, Athens, Greece
[5] Univ Politecn Catalunya UPC, Barcelona, Catalonia, Spain
[6] CSIC UPC, Inst Robot & Informat Ind, Barcelona, Catalonia, Spain
关键词
Referring expressions; Video object segmentation; Vision and language;
D O I
10.1007/s11042-022-13413-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The task of Language-guided Video Object Segmentation (LVOS) aims at generating binary masks for an object referred by a linguistic expression. When this expression unambiguously describes an object in the scene, it is named referring expression (RE). Our work argues that existing benchmarks used for LVOS are mainly composed of trivial cases, in which referents can be identified with simple phrases. Our analysis relies on a new categorization of the referring expressions in the DAVIS-2017 and Actor-Action datasets into trivial and non-trivial REs, where the non-trivial REs are further annotated with seven RE semantic categories. We leverage these data to analyze the performance of RefVOS, a novel neural network that obtains competitive results for the task of language-guided image segmentation and state of the art results for LVOS. Our study indicates that the major challenges for the task are related to understanding motion and static actions.
引用
收藏
页码:4419 / 4438
页数:20
相关论文
共 50 条
  • [41] Gamifying Video Object Segmentation
    Spampinato, Concetto
    Palazzo, Simone
    Giordano, Daniela
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (10) : 1942 - 1958
  • [42] On guiding video object segmentation
    Ortego, Diego
    McGuinness, Kevin
    SanMiguel, Juan C.
    Arazo, Eric
    Martinez, Jose M.
    O'Connor, Noel E.
    2019 INTERNATIONAL CONFERENCE ON CONTENT-BASED MULTIMEDIA INDEXING (CBMI), 2019,
  • [43] Video object clustering segmentation
    Lin, Q
    Zhang, X
    2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, : 2840 - 2843
  • [44] Object segmentation for video coding
    Chen, LH
    Chen, JR
    Liao, HY
    15TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, PROCEEDINGS: IMAGE, SPEECH AND SIGNAL PROCESSING, 2000, : 383 - 386
  • [45] Hierarchical Video Object Segmentation
    Xing, Junliang
    Ai, Haizhou
    Lao, Shihong
    2011 FIRST ASIAN CONFERENCE ON PATTERN RECOGNITION (ACPR), 2011, : 67 - 71
  • [46] VIDEO OBJECT SEGMENTATION AGGREGATION
    Zhou, Tianfei
    Lu, Yao
    Di, Huijun
    Zhang, Jian
    2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2016,
  • [47] Video Object Segmentation: A Survey
    Sasithradevi, A.
    Roomi, S. Mohamed Mansoor
    Mareeswari, M.
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON COMMUNICATION AND ELECTRONICS SYSTEMS (ICCES), 2016, : 656 - 660
  • [48] You Only Infer Once: Cross-Modal Meta-Transfer for Referring Video Object Segmentation
    Li, Dezhuang
    Li, Ruoqi
    Wang, Lijun
    Wang, Yifan
    Qi, Jinqing
    Zhang, Lu
    Liu, Ting
    Xu, Qingquan
    Lu, Huchuan
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 1297 - 1305
  • [49] A Closer Look at Few-Shot Object Detection
    Liu, Yuhao
    Dong, Le
    He, Tengyang
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII, 2024, 14432 : 430 - 447
  • [50] Stored object knowledge and the production of referring expressions: the case of color typicality
    Westerbeek, Hans
    Koolen, Ruud
    Maes, Alfons
    FRONTIERS IN PSYCHOLOGY, 2015, 6