HMTN: Hierarchical Multi-scale Transformer Network for 3D Shape Recognition

被引:3
|
作者
Zhao, Yue [1 ,2 ]
Nie, Weizhi [1 ]
Gao, Zan [3 ]
Liu, An-an [1 ,2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin, Peoples R China
[2] Hefei Comprehens Natl Sci Ctr, Inst Artificial Intelligence, Hefei, Peoples R China
[3] Shandong Artificial Intelligence Inst, Jinan, Peoples R China
基金
中国国家自然科学基金;
关键词
3D Shape Recognition; Transformer; Hierarchical Network;
D O I
10.1145/3503161.3548140
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
As an important field of multimedia, 3D shape recognition has attracted much research attention in recent years. Various approaches have been proposed, within which the multiview-based methods show their promising performances. In general, an effective 3D shape recognition algorithm should take both the multiview local and global visual information into consideration, and explore the inherent properties of generated 3D descriptors to guarantee the performance of feature alignment in the common space. To tackle these issues, we propose a novel Hierarchical Multi-scale Transformer Network (HMTN) for the 3D shape recognition task. In HMTN, we propose a multi-level regional transformer (MLRT) module for shape descriptor generation. MLRT includes two branches that aim to extract the intra-view local characteristics by modeling region-wise dependencies and give the supervision of multiview global information under different granularities. Specifically, MLRT can comprehensively consider the relations of different regions and focus on the discriminative parts, which improves the effectiveness of the learned descriptors. Finally, we adopt the cross-granularity contrastive learning (CCL) mechanism for shape descriptor alignment in the common space. It can explore and utilize the cross-granularity semantic correlation to guide the descriptor extraction process while performing the instance alignment based on the category information. We evaluate the proposed network on several public benchmarks, and HMTN achieves competitive performance compared with the state-of-the-art (SOTA) methods.
引用
收藏
页数:9
相关论文
共 50 条
  • [31] DilateFormer: Multi-Scale Dilated Transformer for Visual Recognition
    Jiao, Jiayu
    Tang, Yu-Ming
    Lin, Kun-Yu
    Gao, Yipeng
    Ma, Andy J.
    Wang, Yaowei
    Zheng, Wei-Shi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 8906 - 8919
  • [32] Multi-Scale Temporal Transformer For Speech Emotion Recognition
    Li, Zhipeng
    Xing, Xiaofen
    Fang, Yuanbo
    Zhang, Weibin
    Fan, Hengsheng
    Xu, Xiangmin
    INTERSPEECH 2023, 2023, : 3652 - 3656
  • [33] 3D multi-scale vision transformer for lung nodule detection in chest CT images
    Mkindu, Hassan
    Wu, Longwen
    Zhao, Yaqin
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (05) : 2473 - 2480
  • [34] 3D multi-scale vision transformer for lung nodule detection in chest CT images
    Hassan Mkindu
    Longwen Wu
    Yaqin Zhao
    Signal, Image and Video Processing, 2023, 17 : 2473 - 2480
  • [35] Motion Energy Guided Multi-scale Heterogeneous Features for 3D Action Recognition
    Liang, Chengwu
    Qi, Lin
    Guan, Ling
    2017 IEEE VISUAL COMMUNICATIONS AND IMAGE PROCESSING (VCIP), 2017,
  • [36] Human Action Recognition via Multi-scale 3D Stationary Wavelet Analysis
    Al-Berry, Maryam N.
    Ebied, Hala M.
    Hussein, Ashraf S.
    Tolba, Mohammed F.
    2014 14TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2014, : 254 - 259
  • [37] A 3D Steel Coils' Recognition Method Based on Multi-Scale Features and Pointnet
    Liu, Zixuan
    Niu, Dan
    Li, Qi
    Chen, Xisong
    Ding, Li
    Liu, Jinbo
    PROCEEDINGS OF THE 39TH CHINESE CONTROL CONFERENCE, 2020, : 5943 - 5947
  • [38] Action recognition with multi-scale trajectory-pooled 3D convolutional descriptors
    Lu, Xiusheng
    Yao, Hongxun
    Zhao, Sicheng
    Sun, Xiaoshuai
    Zhang, Shengping
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (01) : 507 - 523
  • [39] 3D face recognition: Multi-scale strategy based on geometric and local descriptors
    Abbad, Abdelghafour
    Abbad, Khalid
    Tairi, Hamid
    COMPUTERS & ELECTRICAL ENGINEERING, 2018, 70 : 525 - 537
  • [40] Action recognition with multi-scale trajectory-pooled 3D convolutional descriptors
    Xiusheng Lu
    Hongxun Yao
    Sicheng Zhao
    Xiaoshuai Sun
    Shengping Zhang
    Multimedia Tools and Applications, 2019, 78 : 507 - 523