SwinSight: a hierarchical vision transformer using shifted windows to leverage aerial image classification

被引:0
|
作者
Pradhan P.K. [1 ,2 ]
Das A. [3 ]
Kumar A. [3 ]
Baruah U. [4 ]
Sen B. [1 ]
Ghosal P. [1 ]
机构
[1] Department of Information Technology, Sikkim Manipal Institute of Technology, Sikkim Manipal University, East Sikkim, Sikkim, Majitar
[2] Centre for Computers and Communication Technology, Sikkim, Chisopani, South Sikkim
[3] CVPR Unit, Indian Statistical Institute, 203 Barrackpore Trunk Road, West Bengal, Kolkata
[4] Birangana Sati Sadhani Rajyik Vishwavidyalaya, Assam, Golaghat
关键词
Aerial image classification; Convolution neural network; DCT-DWT-FFT; Deep learning; Swin transformer;
D O I
10.1007/s11042-024-19615-9
中图分类号
学科分类号
摘要
In aerial image classification, integrating advanced vision transformers with optimal preprocessing techniques is pivotal for enhancing model performance. This study presents SwinSight, a novel hierarchical vision transformer optimized for aerial image classification, which effectively addresses the computational challenges typically associated with transformers through a shifted window mechanism. The core of the research focuses on enhancing model performance by integrating a systematic preprocessing approach using Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Fast Fourier Transform (FFT). An extensive ablation study evaluates six permutations of these techniques, aiming to identify the most effective sequence for preprocessing. Results indicate that the sequence of DCT, followed by DWT, then FFT, significantly excels, achieving a high classification accuracy of 93.16% and maintaining a rapid inference time of 0.0049 seconds per frame. This sequence’s superior performance highlights the critical role of preprocessing order in optimizing feature extraction, thereby boosting the efficacy of the classification process. SwinSight’s advancements not only set a new benchmark for aerial image analysis but also offer broader implications for enhancing image processing workflows in various applications, contributing to theoretical insights and practical improvements in image-based machine learning tasks. This paper not only offers a practical solution for aerial image classification for diverse applications such as agriculture, environmental monitoring, land use applications, security, and beyond but also presents a novel SAIOD (Sikkim Aerial Images dataset for Object Detection) to the computer vision research community, fostering added advancements. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:86457 / 86478
页数:21
相关论文
共 50 条
  • [21] Stroke Disease Classification Using CT Scan Image with Vision Transformer Method
    Yopiangga, Alfian Prisma
    Badriyah, Tessy
    Syarif, Iwan
    Sakinah, Nur
    2024 INTERNATIONAL ELECTRONICS SYMPOSIUM, IES 2024, 2024, : 436 - 441
  • [22] Hyperspectral Image Classification Using Groupwise Separable Convolutional Vision Transformer Network
    Zhao, Zhuoyi
    Xu, Xiang
    Li, Shutao
    Plaza, Antonio
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62 : 1 - 17
  • [23] Vision Transformer With Contrastive Learning for Hyperspectral Image Classification
    Zhou, Heng
    Zhang, Xin
    Zhang, Chunlei
    Ma, Qiaoyu
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [24] CSiT: A Multiscale Vision Transformer for Hyperspectral Image Classification
    He, Wenxuan
    Huang, Weiliang
    Liao, Shuhong
    Xu, Zhen
    Yan, Jingwen
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2022, 15 : 9266 - 9277
  • [25] Hyperspectral image classification with embedded linear vision transformer
    Tan, Yunfei
    Li, Ming
    Yuan, Longfa
    Shi, Chaoshan
    Luo, Yonghang
    Wen, Guihao
    EARTH SCIENCE INFORMATICS, 2025, 18 (01)
  • [26] CrisisViT: A Robust Vision Transformer for Crisis Image Classification
    Long, Zijun
    McCreadie, Richard
    Imran, Muhammad
    Proceedings of the International ISCRAM Conference, 2023, 2023-text : 309 - 319
  • [27] Compressed-Domain Vision Transformer for Image Classification
    Ji, Ruolei
    Karam, Lina J.
    IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2024, 14 (02) : 299 - 310
  • [28] Vision Transformer with hierarchical structure and windows shifting for person re-identification
    Zhang, Yinghua
    Hou, Wei
    PLOS ONE, 2023, 18 (06):
  • [29] Vision Transformer (ViT)-based Applications in Image Classification
    Huo, Yingzi
    Jin, Kai
    Cai, Jiahong
    Xiong, Huixuan
    Pang, Jiacheng
    2023 IEEE 9TH INTL CONFERENCE ON BIG DATA SECURITY ON CLOUD, BIGDATASECURITY, IEEE INTL CONFERENCE ON HIGH PERFORMANCE AND SMART COMPUTING, HPSC AND IEEE INTL CONFERENCE ON INTELLIGENT DATA AND SECURITY, IDS, 2023, : 135 - 140
  • [30] CSiT: A Multiscale Vision Transformer for Hyperspectral Image Classification
    He, Wenxuan
    Huang, Weiliang
    Liao, Shuhong
    Xu, Zhen
    Yan, Jingwen
    IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2022, 15 : 9266 - 9277