Recombining Vision Transformer Architecture for Fine-Grained Visual Categorization

被引:1
|
作者
Deng, Xuran [1 ]
Liu, Chuanbin [1 ]
Lu, Zhiying [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230026, Peoples R China
来源
基金
中国博士后科学基金;
关键词
Fine-Grained Visual Categorization; Vision Transformer;
D O I
10.1007/978-3-031-27818-1_11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fine-grained visual categorization (FGVC) is a challenging task in the image analysis field which requires comprehensive discriminative feature extraction and representation. To get around this problem, previous works focus on designing complex modules, the so-called necks and heads, over simple backbones, while bringing a huge computational burden. In this paper, we bring a new insight: Vision Transformer itself is an all-in-one FGVC framework that consists of basic Backbone for feature extraction, Neck for further feature enhancement and Head for selecting discriminative feature. We delve into the feature extraction and representation pattern of ViT for FGVC and empirically show that simply recombining the original ViT structure to leverage multi-level semantic representation without introducing any other parameters is able to achieve higher performance. Under such insight, we proposed RecViT, a simple recombination and modification of original ViT, which can capture multi-level semantic features and facilitate fine-grained recognition. In RecViT, the deep layers of the original ViT are served as Head, a few middle layers as Neck and shallow layers as Backbone. In addition, we adopt an optional Feature Processing Module to enhance discriminative feature representation at each semantic level and align them for final recognition. With the above simple modifications, RecViT obtains significant improvement in accuracy in FGVC benchmarks: CUB-200-2011, Stanford Cars and Stanford Dogs.
引用
收藏
页码:127 / 138
页数:12
相关论文
共 50 条
  • [1] Multistage attention region supplement transformer for fine-grained visual categorization
    Mei, Aokun
    Huo, Hua
    Xu, Jiaxin
    Xu, Ningya
    VISUAL COMPUTER, 2025, 41 (03): : 1873 - 1889
  • [2] Hierarchical attention vision transformer for fine-grained visual classification
    Hu, Xiaobin
    Zhu, Shining
    Peng, Taile
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 91
  • [3] Fine-grained visual clasificatio based on compct Vision transformer
    Xu H.
    Guo L.
    Li R.-Z.
    Kongzhi yu Juece/Control and Decision, 2024, 39 (03): : 893 - 900
  • [4] MFVT: Multilevel Feature Fusion Vision Transformer and RAMix Data Augmentation for Fine-Grained Visual Categorization
    Lv, Xinyao
    Xia, Hao
    Li, Na
    Li, Xudong
    Lan, Ruoming
    ELECTRONICS, 2022, 11 (21)
  • [5] Optimized lightweight CA-transformer: Using transformer for fine-grained visual categorization
    Wang, Haiqing
    Shang, Shuqi
    Wang, Dongwei
    He, Xiaoning
    Feng, Kai
    Zhu, Hao
    Li, Chengpeng
    Wang, Yuetao
    ECOLOGICAL INFORMATICS, 2022, 71
  • [6] Feathers Dataset for Fine-Grained Visual Categorization
    Belko, Alina
    Dobratulin, Konstantin
    Kuznetsov, Andrey
    THIRTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2020), 2021, 11605
  • [7] SIM-Trans: Structure Information Modeling Transformer for Fine-grained Visual Categorization
    Sun, Hongbo
    He, Xiangteng
    Peng, Yuxin
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, : 5853 - 5861
  • [8] Coarse-to-Fine Description for Fine-Grained Visual Categorization
    Yao, Hantao
    Zhang, Shiliang
    Zhang, Yongdong
    Li, Jintao
    Tian, Qi
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (10) : 4858 - 4872
  • [9] FINE-GRAINED VISUAL CATEGORIZATION WITH FINE-TUNED SEGMENTATION
    Li, Lingyun
    Guo, Yanqing
    Xie, Lingxi
    Kong, Xiangwei
    Tian, Qi
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 2025 - 2029
  • [10] TransFG: A Transformer Architecture for Fine-Grained Recognition
    He, Ju
    Chen, Jie-Neng
    Liu, Shuai
    Kortylewski, Adam
    Yang, Cheng
    Bai, Yutong
    Wang, Changhu
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 852 - 860