Object-Centric Representation Learning for Video Scene Understanding

被引:0
|
作者
Zhou, Yi [1 ]
Zhang, Hui [1 ]
Park, Seung-In [2 ]
Yoo, ByungIn [2 ]
Qi, Xiaojuan [3 ]
机构
[1] Samsung R&D Inst China Beijing SRC B, Beijing 100028, Peoples R China
[2] Samsung Adv Inst Technol, Suwon 446712, South Korea
[3] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Peoples R China
关键词
Semantics; Task analysis; IP networks; Feature extraction; Pipelines; Estimation; Generators; Scene understanding; video panoptic segmentation; depth estimation; tracking; object-centric representation;
D O I
10.1109/TPAMI.2024.3401409
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Depth-aware Video Panoptic Segmentation (DVPS) is a challenging task that requires predicting the semantic class and 3D depth of each pixel in a video, while also segmenting and consistently tracking objects across frames. Predominant methodologies treat this as a multi-task learning problem, tackling each constituent task independently, thus restricting their capacity to leverage interrelationships amongst tasks and requiring parameter tuning for each task. To surmount these constraints, we present Slot-IVPS, a new approach employing an object-centric model to acquire unified object representations, thereby facilitating the model's ability to simultaneously capture semantic and depth information. Specifically, we introduce a novel representation, Integrated Panoptic Slots (IPS), to capture both semantic and depth information for all panoptic objects within a video, encompassing background semantics and foreground instances. Subsequently, we propose an integrated feature generator and enhancer to extract depth-aware features, alongside the Integrated Video Panoptic Retriever (IVPR), which iteratively retrieves spatial-temporal coherent object features and encodes them into IPS. The resulting IPS can be effortlessly decoded into an array of video outputs, including depth maps, classifications, masks, and object instance IDs. We undertake comprehensive analyses across four datasets, attaining state-of-the-art performance in both Depth-aware Video Panoptic Segmentation and Video Panoptic Segmentation tasks.
引用
收藏
页码:8410 / 8423
页数:14
相关论文
共 50 条
  • [41] Object-Centric Slot Diffusion
    Jiang, Jindong
    Deng, Fei
    Singh, Gautam
    Ahn, Sungjin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [42] 3D Video Object Detection with Learnable Object-Centric Global Optimization
    He, Jiawei
    Chen, Yuntao
    Wang, Naiyan
    Zhang, Zhaoxiang
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 5106 - 5115
  • [43] Weakly Supervised Referring Video Object Segmentation With Object-Centric Pseudo-Guidance
    Wang, Weikang
    Su, Yuting
    Liu, Jing
    Sun, Wei
    Zhai, Guangtao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 1320 - 1333
  • [44] Unsupervised object-centric video generation and decomposition in 3D
    Henderson, Paul
    Lampert, Christoph H.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [45] Floating Waste Discovery by Request via Object-Centric Learning
    Fu, Bingfei
    CMC-COMPUTERS MATERIALS & CONTINUA, 2024, 80 (01): : 1407 - 1424
  • [46] OSCAR-Net: Object-centric Scene Graph Attention for Image Attribution
    Nguyen, Eric
    Bui, Tu
    Swaminathan, Viswanathan
    Collomosse, John
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 14479 - 14488
  • [47] Data-efficient learning of object-centric grasp preferences
    Fleytoux, Yoann
    Ma, Anji
    Ivaldi, Serena
    Mouret, Jean-Baptiste
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 6337 - 6343
  • [48] Learning and Sequencing of Object-Centric Manipulation Skills for Industrial Tasks
    Rozo, Leonel
    Guo, Meng
    Kupcsik, Andras G.
    Todescato, Marco
    Schillinger, Philipp
    Giftthaler, Markus
    Ochs, Matthias
    Spies, Markus
    Waniek, Nicolai
    Kesper, Patrick
    Buerger, Mathias
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 9072 - 9079
  • [49] Learning object-centric complementary features for zero-shot learning
    Liu, Jie
    Song, Kechen
    He, Yu
    Dong, Hongwen
    Yan, Yunhui
    Meng, Qinggang
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2020, 89
  • [50] Object-centric Learning with Cyclic Walks between Parts and Whole
    Wang, Ziyu
    Shou, Mike Zheng
    Zhang, Mengmi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,