SwinSight: a hierarchical vision transformer using shifted windows to leverage aerial image classification

被引:0
|
作者
Pradhan P.K. [1 ,2 ]
Das A. [3 ]
Kumar A. [3 ]
Baruah U. [4 ]
Sen B. [1 ]
Ghosal P. [1 ]
机构
[1] Department of Information Technology, Sikkim Manipal Institute of Technology, Sikkim Manipal University, East Sikkim, Sikkim, Majitar
[2] Centre for Computers and Communication Technology, Sikkim, Chisopani, South Sikkim
[3] CVPR Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, West Bengal, Kolkata
[4] Birangana Sati Sadhani Rajyik Vishwavidyalaya, Assam, Golaghat
关键词
Aerial image classification; Convolution neural network; DCT-DWT-FFT; Deep learning; Swin transformer;
D O I
10.1007/s11042-024-19615-9
中图分类号
学科分类号
摘要
In aerial image classification, integrating advanced vision transformers with optimal preprocessing techniques is pivotal for enhancing model performance. This study presents SwinSight, a novel hierarchical vision transformer optimized for aerial image classification, which effectively addresses the computational challenges typically associated with transformers through a shifted window mechanism. The core of the research focuses on enhancing model performance by integrating a systematic preprocessing approach using Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Fast Fourier Transform (FFT). An extensive ablation study evaluates six permutations of these techniques, aiming to identify the most effective sequence for preprocessing. Results indicate that the sequence of DCT, followed by DWT, then FFT, significantly excels, achieving a high classification accuracy of 93.16% and maintaining a rapid inference time of 0.0049 seconds per frame. This sequence’s superior performance highlights the critical role of preprocessing order in optimizing feature extraction, thereby boosting the efficacy of the classification process. SwinSight’s advancements not only set a new benchmark for aerial image analysis but also offer broader implications for enhancing image processing workflows in various applications, contributing to theoretical insights and practical improvements in image-based machine learning tasks. This paper not only offers a practical solution for aerial image classification for diverse applications such as agriculture, environmental monitoring, land use applications, security, and beyond but also presents a novel SAIOD (Sikkim Aerial Images dataset for Object Detection) to the computer vision research community, fostering added advancements. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:86457 / 86478
页数:21
相关论文
共 50 条
  • [41] Hyperspectral image classification method based on hierarchical transformer network
    Zhang Y.
    Zheng X.
    Lu X.
    Cehui Xuebao/Acta Geodaetica et Cartographica Sinica, 2023, 52 (07): : 1139 - 1147
  • [42] Semi-supervised hierarchical Transformer for hyperspectral Image classification
    He, Ziping
    Zhu, Qianglin
    Xia, Kewen
    Ghamisi, Pedram
    Zu, Baokai
    INTERNATIONAL JOURNAL OF REMOTE SENSING, 2024, 45 (01) : 21 - 50
  • [43] Hint-Based Image Colorization Based on Hierarchical Vision Transformer
    Lee, Subin
    Jung, Yong Ju
    SENSORS, 2022, 22 (19)
  • [44] Vision Transformer with window sequence merging mechanism for image classification
    Jiao, Erjie
    Leng, Qiangkui
    Guo, Jiamei
    Meng, Xiangfu
    Wang, Changzhong
    APPLIED SOFT COMPUTING, 2025, 171
  • [45] Survey of Vision Transformer in Fine-Grained Image Classification
    Sun, Lulu
    Liu, Jianping
    Wang, Jian
    Xing, Jialu
    Zhang, Yue
    Wang, Chenyang
    Computer Engineering and Applications, 60 (10): : 30 - 46
  • [46] MedViT: A robust vision transformer for generalized medical image classification
    Manzari, Omid Nejati
    Ahmadabadi, Hamid
    Kashiani, Hossein
    Shokouhi, Shahriar B.
    Ayatollahi, Ahmad
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 157
  • [47] Supervised Contrastive Vision Transformer for Breast Histopathological Image Classification
    Shiri, Mohammad
    Reddy, Monalika Padma
    Sun, Jiangwen
    2024 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE, IRI 2024, 2024, : 296 - 301
  • [48] FSwin Transformer: Feature-Space Window Attention Vision Transformer for Image Classification
    Yoo, Dayeon
    Kim, Jeesu
    Yoo, Jinwoo
    IEEE ACCESS, 2024, 12 : 72598 - 72606
  • [49] Encoding laparoscopic image to words using vision transformer for distortion classification and ranking in laparoscopic videos
    AlDahoul N.
    Karim H.A.
    Momo M.A.
    Tan M.J.T.
    Fermin J.L.
    Multimedia Tools and Applications, 2025, 84 (10) : 7159 - 7181
  • [50] Swin-GAN: generative adversarial network based on shifted windows transformer architecture for image generation
    Shibin Wang
    Zidiao Gao
    Dong Liu
    The Visual Computer, 2023, 39 : 6085 - 6095