Learning Disentangled Representation for Multi-View 3D Object Recognition

被引:23
|
作者
Huang, Jingjia [1 ]
Yan, Wei [1 ]
Li, Ge [1 ]
Li, Thomas [2 ]
Liu, Shan [3 ]
机构
[1] Peking Univ, Sch Elect & Comp Engn, Shenzhen Grad Sch, Shenzhen 518055, Peoples R China
[2] Peking Univ, AIIT, Hangzhou 100871, Peoples R China
[3] Tencent Media Lab, Palo Alto, CA 94301 USA
关键词
Three-dimensional displays; Solid modeling; Feature extraction; Task analysis; Computer architecture; Object recognition; Computational modeling; Multi-view 3D object; object recognition; disentangled representation; FEATURES;
D O I
10.1109/TCSVT.2021.3062190
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
3D object recognition is a hot research topic. Particularly, view-based methods, which represent a 3D object with a collection of its rendered views on the 2D domain, play an important role in this field. Currently, view-based researches tend to aggregate information from multiple views via pooling based strategies to endow the models with the characteristic of view permutation invariance, at the cost of inevitable loss of useful features. In this paper, we introduce a new method that learns a more comprehensive descriptor for a 3D object from its views while successfully keeping its robustness to the variation of view permutation. Our method disentangles the information in the set of multi-view images into a global category-related feature and a set of view-permutation related features. To unbind these two parts, an encode-decoder based disentangling architecture is proposed, which barely bring extra computations compared to the baseline model. Systematic experiments are conducted for this new method to demonstrates the effectiveness and the competitive performance based on ModelNet40, ModelNet10, and ShapeNetCore55 datasets. Codes for our paper will be released soon on "https://github.com/hjjpku/multi_view_sort".
引用
收藏
页码:646 / 659
页数:14
相关论文
共 50 条
  • [21] Multi-View Attentive Contextualization for Multi-View 3D Object Detection
    Liu, Xianpeng
    Zheng, Ce
    Qian, Ming
    Xue, Nan
    Chen, Chen
    Zhang, Zhebin
    Li, Chen
    Wu, Tianfu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16688 - 16698
  • [22] MORE: simultaneous multi-view 3D object recognition and pose estimation
    Parisotto, Tommaso
    Mukherjee, Subhaditya
    Kasaei, Hamidreza
    INTELLIGENT SERVICE ROBOTICS, 2023, 16 (04) : 497 - 508
  • [23] 3D Object Recognition via Multi-View Inspection in Unknown Environments
    Westell, Jamie
    Saeedi, Parvaneh
    11TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV 2010), 2010, : 2088 - 2095
  • [24] MORE: simultaneous multi-view 3D object recognition and pose estimation
    Tommaso Parisotto
    Subhaditya Mukherjee
    Hamidreza Kasaei
    Intelligent Service Robotics, 2023, 16 : 497 - 508
  • [25] Fast and Robust Multi-View 3D Object Recognition in Point Clouds
    Pang, Guan
    Neumann, Ulrich
    2015 INTERNATIONAL CONFERENCE ON 3D VISION, 2015, : 171 - 179
  • [26] Multi-View Token Clustering and Fusion for 3D Object Recognition and Retrieval
    Fan, Linlong
    Ge, Yanqi
    Li, Wen
    Duan, Lixin
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 1145 - 1150
  • [27] Recognition of 3D Object Based on Multi-View Recurrent Neural Networks
    Dong S.
    Li W.-S.
    Zhang W.-Q.
    Zou K.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2020, 49 (02): : 269 - 275
  • [28] Multi-View Silhouette and Depth Decomposition for High Resolution 3D Object Representation
    Smith, Edward
    Fujimoto, Scott
    Meger, David
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [29] A COMPACT 3D REPRESENTATION FOR MULTI-VIEW VIDEO
    Salvador, Jordi
    Casas, Josep R.
    INTERNATIONAL CONFERENCE ON 3D IMAGING 2011 (IC3D 2011), 2011,
  • [30] Cyclic Refiner: Object-Aware Temporal Representation Learning for Multi-view 3D Detection and Tracking
    Guo, Mingzhe
    Zhang, Zhipeng
    Jing, Liping
    He, Yuan
    Wang, Ke
    Fan, Heng
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2024, 132 (12) : 6184 - 6206