A closer look at referring expressions for video object segmentation

被引:6
|
作者
Bellver, Miriam [1 ]
Ventura, Carles [2 ]
Silberer, Carina [3 ]
Kazakos, Ioannis [4 ]
Torres, Jordi [1 ]
Giro-i-Nieto, Xavier [5 ,6 ]
机构
[1] Barcelona Supercomp Ctr BSC, Barcelona, Spain
[2] Univ Oberta Catalunya UOC, Barcelona, Spain
[3] Univ Stuttgart, Inst NLP, Stuttgart, Germany
[4] Natl Tech Univ Athens, Athens, Greece
[5] Univ Politecn Catalunya UPC, Barcelona, Catalonia, Spain
[6] CSIC UPC, Inst Robot & Informat Ind, Barcelona, Catalonia, Spain
关键词
Referring expressions; Video object segmentation; Vision and language;
D O I
10.1007/s11042-022-13413-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The task of Language-guided Video Object Segmentation (LVOS) aims at generating binary masks for an object referred by a linguistic expression. When this expression unambiguously describes an object in the scene, it is named referring expression (RE). Our work argues that existing benchmarks used for LVOS are mainly composed of trivial cases, in which referents can be identified with simple phrases. Our analysis relies on a new categorization of the referring expressions in the DAVIS-2017 and Actor-Action datasets into trivial and non-trivial REs, where the non-trivial REs are further annotated with seven RE semantic categories. We leverage these data to analyze the performance of RefVOS, a novel neural network that obtains competitive results for the task of language-guided image segmentation and state of the art results for LVOS. Our study indicates that the major challenges for the task are related to understanding motion and static actions.
引用
收藏
页码:4419 / 4438
页数:20
相关论文
共 50 条
  • [31] Fully Transformer-Equipped Architecture for end-to-end Referring Video Object Segmentation
    Li, Ping
    Zhang, Yu
    Yuan, Li
    Xu, Xianghua
    Information Processing and Management, 2024, 61 (01):
  • [32] Learning Cross-Modal Affinity for Referring Video Object Segmentation Targeting Limited Samples
    Li, Guanghui
    Gao, Mingqi
    Liu, Heng
    Zhen, Xiantong
    Zheng, Feng
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 2684 - 2693
  • [33] Fully Transformer-Equipped Architecture for end-to-end Referring Video Object Segmentation
    Li, Ping
    Zhang, Yu
    Yuan, Li
    Xu, Xianghua
    INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (01)
  • [34] Look Before You Match: Instance Understanding Matters in Video Object Segmentation
    Wang, Junke
    Chen, Dongdong
    Wu, Zuxuan
    Luo, Chong
    Tang, Chuanxin
    Dai, Xiyang
    Zhao, Yucheng
    Xie, Yujia
    Yuan, Lu
    Jiang, Yu-Gang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2268 - 2278
  • [35] Breaking the "Object" in Video Object Segmentation
    Tokmakov, Pavel
    Li, Jie
    Gaidon, Adrien
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 22836 - 22845
  • [36] HTML']HTML: Hybrid Temporal-scale Multimodal Learning Framework for Referring Video Object Segmentation
    Han, Mingfei
    Wang, Yali
    Li, Zhihui
    Yao, Lina
    Chang, Xiaojun
    Qiao, Yu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 13368 - 13377
  • [37] Towards Noise-Tolerant Speech-Referring Video Object Segmentation: Bridging Speech and Text
    Li, Xiang
    Wang, Jinglu
    Xu, Xiaohao
    Yang, Muqiao
    Yang, Fan
    Zhao, Yizhou
    Singh, Rita
    Raj, Bhiksha
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2283 - 2296
  • [38] A closer look at preschoolers' freely produced labels for facial expressions
    Widen, SC
    Russell, JA
    DEVELOPMENTAL PSYCHOLOGY, 2003, 39 (01) : 114 - 128
  • [39] Video Object of Interest Segmentation
    Zhou, Siyuan
    Zhan, Chunru
    Wang, Biao
    Ge, Tiezheng
    Jiang, Yuning
    Niu, Li
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 3805 - 3813
  • [40] An Overview of Video Object Segmentation
    Zhu, Shiping
    Guo, Zhichao
    2012 INTERNATIONAL CONFERENCE ON INDUSTRIAL CONTROL AND ELECTRONICS ENGINEERING (ICICEE), 2012, : 1019 - 1021