Object-Centric Representation Learning for Video Scene Understanding

被引:0
|
作者
Zhou, Yi [1 ]
Zhang, Hui [1 ]
Park, Seung-In [2 ]
Yoo, ByungIn [2 ]
Qi, Xiaojuan [3 ]
机构
[1] Samsung R&D Inst China Beijing SRC B, Beijing 100028, Peoples R China
[2] Samsung Adv Inst Technol, Suwon 446712, South Korea
[3] Univ Hong Kong, Dept Elect & Elect Engn, Hong Kong, Peoples R China
关键词
Semantics; Task analysis; IP networks; Feature extraction; Pipelines; Estimation; Generators; Scene understanding; video panoptic segmentation; depth estimation; tracking; object-centric representation;
D O I
10.1109/TPAMI.2024.3401409
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Depth-aware Video Panoptic Segmentation (DVPS) is a challenging task that requires predicting the semantic class and 3D depth of each pixel in a video, while also segmenting and consistently tracking objects across frames. Predominant methodologies treat this as a multi-task learning problem, tackling each constituent task independently, thus restricting their capacity to leverage interrelationships amongst tasks and requiring parameter tuning for each task. To surmount these constraints, we present Slot-IVPS, a new approach employing an object-centric model to acquire unified object representations, thereby facilitating the model's ability to simultaneously capture semantic and depth information. Specifically, we introduce a novel representation, Integrated Panoptic Slots (IPS), to capture both semantic and depth information for all panoptic objects within a video, encompassing background semantics and foreground instances. Subsequently, we propose an integrated feature generator and enhancer to extract depth-aware features, alongside the Integrated Video Panoptic Retriever (IVPR), which iteratively retrieves spatial-temporal coherent object features and encodes them into IPS. The resulting IPS can be effortlessly decoded into an array of video outputs, including depth maps, classifications, masks, and object instance IDs. We undertake comprehensive analyses across four datasets, attaining state-of-the-art performance in both Depth-aware Video Panoptic Segmentation and Video Panoptic Segmentation tasks.
引用
收藏
页码:8410 / 8423
页数:14
相关论文
共 50 条
  • [31] Uni-and-Bi-Directional Video Prediction via Learning Object-Centric Transformation
    Chen, Xiongtao
    Wang, Wenmin
    IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (06) : 1591 - 1604
  • [32] Object-Centric Debugging
    Ressia, Jorge
    Bergel, Alexandre
    Nierstrasz, Oscar
    2012 34TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2012, : 485 - 495
  • [33] Semantic Tracklets: An Object-Centric Representation for Visual Multi-Agent Reinforcement Learning
    Liu, Iou-Jen
    Ren, Zhongzheng
    Yeh, Raymond A.
    Schwing, Alexander G.
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 5603 - 5610
  • [34] APEX: Unsupervised, Object-Centric Scene Segmentation and Tracking for Robot Manipulation
    Wu, Yizhe
    Jones, Oiwi Parker
    Engelcke, Martin
    Posner, Ingmar
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 3375 - 3382
  • [35] Segmenting Moving Objects via an Object-Centric Layered Representation
    Xie, Junyu
    Xie, Weidi
    Zisserman, Andrew
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [36] Rethinking Image-to-Video Adaptation: An Object-Centric Perspective
    Qian, Rui
    Ding, Shuangrui
    Lin, Dahua
    COMPUTER VISION-ECCV 2024, PT XLIII, 2025, 15101 : 329 - 348
  • [37] Self-supervised Object-Centric Learning for Videos
    Aydemir, Gorkay
    Xie, Weidi
    Guney, Fatma
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [38] Object-Centric Multiple Object Tracking
    Zhao, Zixu
    Wang, Jiaze
    Horn, Max
    Ding, Yizhuo
    He, Tong
    Bai, Zechen
    Zietlow, Dominik
    Simon-Gabriel, Carl-Johann
    Shuai, Bing
    Tu, Zhuowen
    Brox, Thomas
    Schiele, Bernt
    Fu, Yanwei
    Locatello, Francesco
    Zhang, Zheng
    Xiao, Tianjun
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 16555 - 16565
  • [39] Deep Object-Centric Representations for Generalizable Robot Learning
    Devin, Coline
    Abbeel, Pieter
    Darrell, Trevor
    Levine, Sergey
    2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2018, : 7111 - 7118
  • [40] Learning Dexterous Grasping with Object-Centric Visual Affordances
    Mandikal, Priyanka
    Grauman, Kristen
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 6169 - 6176