Object-Centric Representation Learning for Video Scene Understanding

被引:0
|
作者
Zhou, Yi [1 ]
Zhang, Hui [1 ]
Park, Seung-In [2 ]
Yoo, ByungIn [2 ]
Qi, Xiaojuan [3 ]
机构
[1] Samsung R&D Inst China Beijing SRC B, Beijing 100028, Peoples R China
[2] Samsung Adv Inst Technol, Suwon 446712, South Korea
[3] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Peoples R China
关键词
Semantics; Task analysis; IP networks; Feature extraction; Pipelines; Estimation; Generators; Scene understanding; video panoptic segmentation; depth estimation; tracking; object-centric representation;
D O I
10.1109/TPAMI.2024.3401409
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Depth-aware Video Panoptic Segmentation (DVPS) is a challenging task that requires predicting the semantic class and 3D depth of each pixel in a video, while also segmenting and consistently tracking objects across frames. Predominant methodologies treat this as a multi-task learning problem, tackling each constituent task independently, thus restricting their capacity to leverage interrelationships amongst tasks and requiring parameter tuning for each task. To surmount these constraints, we present Slot-IVPS, a new approach employing an object-centric model to acquire unified object representations, thereby facilitating the model's ability to simultaneously capture semantic and depth information. Specifically, we introduce a novel representation, Integrated Panoptic Slots (IPS), to capture both semantic and depth information for all panoptic objects within a video, encompassing background semantics and foreground instances. Subsequently, we propose an integrated feature generator and enhancer to extract depth-aware features, alongside the Integrated Video Panoptic Retriever (IVPR), which iteratively retrieves spatial-temporal coherent object features and encodes them into IPS. The resulting IPS can be effortlessly decoded into an array of video outputs, including depth maps, classifications, masks, and object instance IDs. We undertake comprehensive analyses across four datasets, attaining state-of-the-art performance in both Depth-aware Video Panoptic Segmentation and Video Panoptic Segmentation tasks.
引用
收藏
页码:8410 / 8423
页数:14
相关论文
共 50 条
  • [21] Provably Learning Object-Centric Representations
    Brady, Jack
    Zimmermann, Roland S.
    Sharma, Yash
    Schoelkopf, Bernhard
    von Kuegelgen, Julius
    Brendel, Wieland
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [22] Object-Centric Learning with Slot Attention
    Locatello, Francesco
    Weissenborn, Dirk
    Unterthiner, Thomas
    Mahendran, Aravindh
    Heigold, Georg
    Uszkoreit, Jakob
    Dosovitskiy, Alexey
    Kipf, Thomas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [23] Deep convolution neural network with scene-centric and object-centric information for object detection
    Shen, Zong-Ying
    Han, Shiang-Yu
    Fu, Li-Chen
    Hsiao, Pei-Yung
    Lau, Yo-Chung
    Chang, Sheng-Jen
    IMAGE AND VISION COMPUTING, 2019, 85 : 14 - 25
  • [24] Spatially Invariant Unsupervised 3D Object-Centric Learning and Scene Decomposition
    Wang, Tianyu
    Liu, Miaomiao
    Ng, Kee Siong
    COMPUTER VISION, ECCV 2022, PT XXIII, 2022, 13683 : 120 - 135
  • [25] Object-Centric Street Scene Synthesis with Generative Adversarial Networks
    Van den Abeele, Maxim
    Neven, Davy
    De Brabandere, Bert
    Proesmans, Marc
    Van Gool, Luc
    20TH IEEE MEDITERRANEAN ELETROTECHNICAL CONFERENCE (IEEE MELECON 2020), 2020, : 665 - 671
  • [26] InstMove: Instance Motion for Object-centric Video Segmentation
    Liu, Qihao
    Wu, Junfeng
    Jiang, Yi
    Bai, Xiang
    Yuille, Alan
    Bai, Song
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 6344 - 6354
  • [27] Generalization and Robustness Implications in Object-Centric Learning
    Dittadi, Andrea
    Papa, Samuele
    De Vita, Michele
    Scholkopf, Bernhard
    Winther, Ole
    Locatello, Francesco
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [28] OBJECT-CENTRIC VIDEO PREDICTION VIA DECOUPLING OF OBJECT DYNAMICS AND INTERACTIONS
    Villar-Corrales, Angel
    Wahdan, Ismail
    Behnke, Sven
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 570 - 574
  • [29] Object-centric Learning with Capsule Networks: A Survey
    Ribeiro, Fabio De Sousa
    Duarte, Kevin
    Everett, Miles
    Leontidis, Georgios
    Shah, Mubarak
    ACM COMPUTING SURVEYS, 2024, 56 (11)
  • [30] Object-Centric Video Anomaly Detection with Covariance Features
    Bilecen, Ali Enver
    Ozkan, Huseyin
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,