SwinSight: a hierarchical vision transformer using shifted windows to leverage aerial image classification

被引:0
|
作者
Pradhan P.K. [1 ,2 ]
Das A. [3 ]
Kumar A. [3 ]
Baruah U. [4 ]
Sen B. [1 ]
Ghosal P. [1 ]
机构
[1] Department of Information Technology, Sikkim Manipal Institute of Technology, Sikkim Manipal University, East Sikkim, Sikkim, Majitar
[2] Centre for Computers and Communication Technology, Sikkim, Chisopani, South Sikkim
[3] CVPR Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, West Bengal, Kolkata
[4] Birangana Sati Sadhani Rajyik Vishwavidyalaya, Assam, Golaghat
关键词
Aerial image classification; Convolution neural network; DCT-DWT-FFT; Deep learning; Swin transformer;
D O I
10.1007/s11042-024-19615-9
中图分类号
学科分类号
摘要
In aerial image classification, integrating advanced vision transformers with optimal preprocessing techniques is pivotal for enhancing model performance. This study presents SwinSight, a novel hierarchical vision transformer optimized for aerial image classification, which effectively addresses the computational challenges typically associated with transformers through a shifted window mechanism. The core of the research focuses on enhancing model performance by integrating a systematic preprocessing approach using Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Fast Fourier Transform (FFT). An extensive ablation study evaluates six permutations of these techniques, aiming to identify the most effective sequence for preprocessing. Results indicate that the sequence of DCT, followed by DWT, then FFT, significantly excels, achieving a high classification accuracy of 93.16% and maintaining a rapid inference time of 0.0049 seconds per frame. This sequence’s superior performance highlights the critical role of preprocessing order in optimizing feature extraction, thereby boosting the efficacy of the classification process. SwinSight’s advancements not only set a new benchmark for aerial image analysis but also offer broader implications for enhancing image processing workflows in various applications, contributing to theoretical insights and practical improvements in image-based machine learning tasks. This paper not only offers a practical solution for aerial image classification for diverse applications such as agriculture, environmental monitoring, land use applications, security, and beyond but also presents a novel SAIOD (Sikkim Aerial Images dataset for Object Detection) to the computer vision research community, fostering added advancements. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:86457 / 86478
页数:21
相关论文
共 50 条
  • [31] HYBRID VISION TRANSFORMER MODEL FOR HYPERSPECTRAL IMAGE CLASSIFICATION
    Yang, Jiaqi
    Du, Bo
    Wu, Chen
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 1388 - 1391
  • [32] PSVT: Pyramid Shifted Window based Vision Transformer for cardiac image segmentation
    Zhang, Xingyu
    Liu, Jiacheng
    Xian, Xiaoli
    Chen, Bo
    Li, Dong
    Yang, Fei
    Zhang, Lei
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2025, 102
  • [33] A Local-Global Interactive Vision Transformer for Aerial Scene Classification
    Peng, Ting
    Yi, Jingjun
    Fang, Yuan
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [34] HSVLT: Hierarchical Scale-Aware Vision-Language Transformer for Multi-Label Image Classification
    Ouyang, Shuyi
    Wang, Hongyi
    Niu, Ziwei
    Bai, Zhenjia
    Xie, Shiao
    Xu, Yingying
    Tong, Ruofeng
    Chen, Yen-Wei
    Lin, Lanfen
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4768 - 4777
  • [35] Shift-ViT : Siamese Vision Transformer using Shifted Branches
    Aim, Dasom
    Kim, Hyeong Jin
    Kim, Sangwon
    Ko, Byoung Chul
    2022 37TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2022), 2022, : 259 - 261
  • [36] Hierarchical attention vision transformer for fine-grained visual classification
    Hu, Xiaobin
    Zhu, Shining
    Peng, Taile
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2023, 91
  • [37] Glaucoma Classification using Light Vision Transformer
    Singh P.B.
    Singh P.
    Dev H.
    Tiwari A.
    Batra D.
    Chaurasia B.K.
    EAI Endorsed Transactions on Pervasive Health and Technology, 2023, 9
  • [38] Diabetic Retinopathy Classification using Vision Transformer
    Mutawa, A. M.
    Sruthi, Sai
    2022 6TH EUROPEAN CONFERENCE ON ELECTRICAL ENGINEERING & COMPUTER SCIENCE, ELECS, 2022, : 25 - 30
  • [39] Investigation of Hierarchical Spectral Vision Transformer Architecture for Classification of Hyperspectral Imagery
    Liu, Wei
    Prasad, Saurabh
    Crawford, Melba
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
  • [40] SPT-Swin: A Shifted Patch Tokenization Swin Transformer for Image Classification
    Ferdous, Gazi Jannatul
    Sathi, Khaleda Akhter
    Hossain, Md. Azad
    Dewan, M. Ali Akber
    IEEE ACCESS, 2024, 12 : 117617 - 117626