Recombining Vision Transformer Architecture for Fine-Grained Visual Categorization

被引:1
|
作者
Deng, Xuran [1 ]
Liu, Chuanbin [1 ]
Lu, Zhiying [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230026, Peoples R China
来源
基金
中国博士后科学基金;
关键词
Fine-Grained Visual Categorization; Vision Transformer;
D O I
10.1007/978-3-031-27818-1_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fine-grained visual categorization (FGVC) is a challenging task in the image analysis field which requires comprehensive discriminative feature extraction and representation. To get around this problem, previous works focus on designing complex modules, the so-called necks and heads, over simple backbones, while bringing a huge computational burden. In this paper, we bring a new insight: Vision Transformer itself is an all-in-one FGVC framework that consists of basic Backbone for feature extraction, Neck for further feature enhancement and Head for selecting discriminative feature. We delve into the feature extraction and representation pattern of ViT for FGVC and empirically show that simply recombining the original ViT structure to leverage multi-level semantic representation without introducing any other parameters is able to achieve higher performance. Under such insight, we proposed RecViT, a simple recombination and modification of original ViT, which can capture multi-level semantic features and facilitate fine-grained recognition. In RecViT, the deep layers of the original ViT are served as Head, a few middle layers as Neck and shallow layers as Backbone. In addition, we adopt an optional Feature Processing Module to enhance discriminative feature representation at each semantic level and align them for final recognition. With the above simple modifications, RecViT obtains significant improvement in accuracy in FGVC benchmarks: CUB-200-2011, Stanford Cars and Stanford Dogs.
引用
收藏
页码:127 / 138
页数:12
相关论文
共 50 条
  • [21] Fine-grained Visual Categorization with 2D-Warping
    Hanselmann, Harald
    Ney, Hermann
    2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, : 608 - 613
  • [22] Cross-X Learning for Fine-Grained Visual Categorization
    Luo, Wei
    Yang, Xitong
    Mo, Xianjie
    Lu, Yuheng
    Davis, Larry S.
    Li, Jun
    Yang, Jian
    Lim, Ser-Nam
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8241 - 8250
  • [23] A survey of fine-grained visual categorization based on deep learning
    XIE Yuxiang
    GONG Quanzhi
    LUAN Xidao
    YAN Jie
    ZHANG Jiahui
    Journal of Systems Engineering and Electronics, 2024, 35 (06) : 1337 - 1356
  • [24] A Survey of Fine-Grained Visual Categorization Based on Deep Learning
    Xie, Yuxiang
    Gong, Quanzhi
    Luan, Xidao
    Yan, Jie
    Zhang, Jiahui
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2024, 35 (06) : 1337 - 1356
  • [25] Multiresolution Discriminative Mixup Network for Fine-Grained Visual Categorization
    Xu, Kunran
    Lai, Rui
    Gu, Lin
    Li, Yishi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (07) : 3488 - 3500
  • [26] SHAPE-GUIDED SEGMENTATION FOR FINE-GRAINED VISUAL CATEGORIZATION
    Sun, Ming
    Yang, Jufeng
    Sun, Bo
    Wang, Kai
    2016 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO (ICME), 2016,
  • [27] Refined probability distribution module for fine-grained visual categorization
    Zhao, Peipei
    Miao, Qiguang
    Li, Hongsheng
    Liu, Ruyi
    Quan, Yining
    Song, Jianfeng
    NEUROCOMPUTING, 2023, 518 : 533 - 544
  • [28] Part-Stacked CNN for Fine-Grained Visual Categorization
    Huang, Shaoli
    Xu, Zhe
    Tao, Dacheng
    Zhang, Ya
    2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 1173 - 1182
  • [29] A Deep Sparse Coding Method for Fine-Grained Visual Categorization
    Guo, Lihua
    Guo, Chenggang
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 632 - 639
  • [30] Orientational Spatial Part Modeling for Fine-Grained Visual Categorization
    Yao, Hantao
    Zhang, Shiliang
    Xie, Fei
    Zhang, Yongdong
    Zhang, Dongming
    Su, Yu
    Tian, Qi
    2015 IEEE THIRD INTERNATIONAL CONFERENCE ON MOBILE SERVICES MS 2015, 2015, : 360 - 367