SwinSight: a hierarchical vision transformer using shifted windows to leverage aerial image classification

被引:0
|
作者
Pradhan P.K. [1 ,2 ]
Das A. [3 ]
Kumar A. [3 ]
Baruah U. [4 ]
Sen B. [1 ]
Ghosal P. [1 ]
机构
[1] Department of Information Technology, Sikkim Manipal Institute of Technology, Sikkim Manipal University, East Sikkim, Sikkim, Majitar
[2] Centre for Computers and Communication Technology, Sikkim, Chisopani, South Sikkim
[3] CVPR Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, West Bengal, Kolkata
[4] Birangana Sati Sadhani Rajyik Vishwavidyalaya, Assam, Golaghat
关键词
Aerial image classification; Convolution neural network; DCT-DWT-FFT; Deep learning; Swin transformer;
D O I
10.1007/s11042-024-19615-9
中图分类号
学科分类号
摘要
In aerial image classification, integrating advanced vision transformers with optimal preprocessing techniques is pivotal for enhancing model performance. This study presents SwinSight, a novel hierarchical vision transformer optimized for aerial image classification, which effectively addresses the computational challenges typically associated with transformers through a shifted window mechanism. The core of the research focuses on enhancing model performance by integrating a systematic preprocessing approach using Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Fast Fourier Transform (FFT). An extensive ablation study evaluates six permutations of these techniques, aiming to identify the most effective sequence for preprocessing. Results indicate that the sequence of DCT, followed by DWT, then FFT, significantly excels, achieving a high classification accuracy of 93.16% and maintaining a rapid inference time of 0.0049 seconds per frame. This sequence’s superior performance highlights the critical role of preprocessing order in optimizing feature extraction, thereby boosting the efficacy of the classification process. SwinSight’s advancements not only set a new benchmark for aerial image analysis but also offer broader implications for enhancing image processing workflows in various applications, contributing to theoretical insights and practical improvements in image-based machine learning tasks. This paper not only offers a practical solution for aerial image classification for diverse applications such as agriculture, environmental monitoring, land use applications, security, and beyond but also presents a novel SAIOD (Sikkim Aerial Images dataset for Object Detection) to the computer vision research community, fostering added advancements. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:86457 / 86478
页数:21
相关论文
共 50 条
  • [1] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
    Liu, Ze
    Lin, Yutong
    Cao, Yue
    Hu, Han
    Wei, Yixuan
    Zhang, Zheng
    Lin, Stephen
    Guo, Baining
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9992 - 10002
  • [2] Vision Transformer With Hybrid Shifted Windows for Gastrointestinal Endoscopy Image Classification
    Wang, Wei
    Yang, Xin
    Tang, Jinhui
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4452 - 4461
  • [3] Hierarchical Pretrained Backbone Vision Transformer for Image Classification in Histopathology
    Zedda, Luca
    Loddo, Andrea
    Di Ruberto, Cecilia
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 223 - 234
  • [4] Robust Visual Tracking Using Hierarchical Vision Transformer with Shifted Windows Multi-Head Self-Attention
    Gao, Peng
    Zhang, Xin-Yue
    Yang, Xiao-Li
    Ni, Jian-Cheng
    Wang, Fei
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2024, E107D (01) : 161 - 164
  • [5] Automatic segmentation of echocardiographic images using a shifted windows vision transformer architecture
    Nemri, Souha
    Duong, Luc
    BIOMEDICAL PHYSICS & ENGINEERING EXPRESS, 2024, 10 (06):
  • [6] FishAI: Automated hierarchical marine fish image classification with vision transformer
    Yang, Chenghan
    Zhou, Peng
    Wang, Chun-Sheng
    Fu, Ge-Yi
    Xu, Xue-Wei
    Niu, Zhibin
    Zhu, Lin
    Yuan, Ye
    Shen, Hong-Bin
    Pan, Xiaoyong
    ENGINEERING REPORTS, 2024, 6 (12)
  • [7] Image Classification Using Vision Transformer for EtC Images
    Hamano, Genki
    Imaizumi, Shoko
    Kiya, Hitoshi
    PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 1506 - 1513
  • [8] Image Quality Distortion Classification Using Vision Transformer
    Lynn, Nay Chi
    Shimamura, Tetsuya
    ADVANCED INFORMATION NETWORKING AND APPLICATIONS, VOL 1, AINA 2024, 2024, 199 : 353 - 361
  • [9] Shifted Window Vision Transformer for Blood Cell Classification
    Chen, Shuwen
    Lu, Siyuan
    Wang, Shuihua
    Ni, Yiyang
    Zhang, Yudong
    ELECTRONICS, 2023, 12 (11)
  • [10] The Application of Vision Transformer in Image Classification
    He, Zhixuan
    2022 THE 6TH INTERNATIONAL CONFERENCE ON VIRTUAL AND AUGMENTED REALITY SIMULATIONS, ICVARS 2022, 2022, : 56 - 63