Learning Disentangled Representation for Multi-View 3D Object Recognition

被引:23
|
作者
Huang, Jingjia [1 ]
Yan, Wei [1 ]
Li, Ge [1 ]
Li, Thomas [2 ]
Liu, Shan [3 ]
机构
[1] Peking Univ, Sch Elect & Comp Engn, Shenzhen Grad Sch, Shenzhen 518055, Peoples R China
[2] Peking Univ, AIIT, Hangzhou 100871, Peoples R China
[3] Tencent Media Lab, Palo Alto, CA 94301 USA
关键词
Three-dimensional displays; Solid modeling; Feature extraction; Task analysis; Computer architecture; Object recognition; Computational modeling; Multi-view 3D object; object recognition; disentangled representation; FEATURES;
D O I
10.1109/TCSVT.2021.3062190
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
3D object recognition is a hot research topic. Particularly, view-based methods, which represent a 3D object with a collection of its rendered views on the 2D domain, play an important role in this field. Currently, view-based researches tend to aggregate information from multiple views via pooling based strategies to endow the models with the characteristic of view permutation invariance, at the cost of inevitable loss of useful features. In this paper, we introduce a new method that learns a more comprehensive descriptor for a 3D object from its views while successfully keeping its robustness to the variation of view permutation. Our method disentangles the information in the set of multi-view images into a global category-related feature and a set of view-permutation related features. To unbind these two parts, an encode-decoder based disentangling architecture is proposed, which barely bring extra computations compared to the baseline model. Systematic experiments are conducted for this new method to demonstrates the effectiveness and the competitive performance based on ModelNet40, ModelNet10, and ShapeNetCore55 datasets. Codes for our paper will be released soon on "https://github.com/hjjpku/multi_view_sort".
引用
收藏
页码:646 / 659
页数:14
相关论文
共 50 条
  • [1] Learning Relationships for Multi-View 3D Object Recognition
    Yang, Ze
    Wang, Liwei
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7504 - 7513
  • [2] Multi-view representation and synthesis for 3D object movie
    Lie, WN
    Wei, BE
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2002, : 529 - 532
  • [3] CFMVOR: Federated Multi-view 3D Object Recognition Based on Compressed Learning
    Xiao, Di
    Zhang, Meng
    Zhang, Maolan
    Chen, Lvjun
    PATTERN RECOGNITION AND COMPUTER VISION, PT XIII, PRCV 2024, 2025, 15043 : 280 - 293
  • [4] Review of multi-view 3D object recognition methods based on deep learning
    Qi, Shaohua
    Ning, Xin
    Yang, Guowei
    Zhang, Liping
    Long, Peng
    Cai, Weiwei
    Li, Weijun
    DISPLAYS, 2021, 69
  • [5] 3D LayoutCRF for multi-view object class recognition and segmentation
    Hoiem, Derek
    Rother, Carsten
    Winn, John
    2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-8, 2007, : 580 - +
  • [6] Multi-view convolutional vision transformer for 3D object recognition
    Li, Jie
    Liu, Zhao
    Li, Li
    Lin, Junqin
    Yao, Jian
    Tu, Jingmin
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 95
  • [7] MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition
    Wang, Luequan
    Xu, Hongbin
    Kang, Wenxiong
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (06) : 872 - 883
  • [8] Multi-view ensemble manifold regularization for 3D object recognition
    Hong, Chaoqun
    Yu, Jun
    You, Jane
    Chen, Xuhui
    Tao, Dapeng
    INFORMATION SCIENCES, 2015, 320 : 395 - 405
  • [9] MVContrast: Unsupervised Pretraining for Multi-view 3D Object Recognition
    Luequan Wang
    Hongbin Xu
    Wenxiong Kang
    Machine Intelligence Research, 2023, 20 : 872 - 883
  • [10] Deep models for multi-view 3D object recognition: a review
    Alzahrani, Mona
    Usman, Muhammad
    Jarraya, Salma Kammoun
    Anwar, Saeed
    Helmy, Tarek
    ARTIFICIAL INTELLIGENCE REVIEW, 2024, 57 (12)