Fine-Grained Ship Classification by Combining CNN and Swin Transformer

被引:23
|
作者
Huang, Liang [1 ]
Wang, Fengxiang [1 ]
Zhang, Yalun [2 ]
Xu, Qingxia [3 ]
机构
[1] Naval Univ Engn, Coll Elect Engn, Wuhan 430000, Peoples R China
[2] Naval Univ Engn, Inst Noise & Vibrat, Wuhan 430000, Peoples R China
[3] Natl Univ Def Technol, Coll Int Studies, Wuhan 430000, Peoples R China
关键词
image classification; ship detection; remote sensing images; self-attention; transformer; CNN;
D O I
10.3390/rs14133087
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
The mainstream algorithms used for ship classification and detection can be improved based on convolutional neural networks (CNNs). By analyzing the characteristics of ship images, we found that the difficulty in ship image classification lies in distinguishing ships with similar hull structures but different equipment and superstructures. To extract features such as ship superstructures, this paper introduces transformer architecture with self-attention into ship classification and detection, and a CNN and Swin transformer model (CNN-Swin model) is proposed for ship image classification and detection. The main contributions of this study are as follows: (1) The proposed approach pays attention to different scale features in ship image classification and detection, introduces a transformer architecture with self-attention into ship classification and detection for the first time, and uses a parallel network of a CNN and a transformer to extract features of images. (2) To exploit the CNN's performance and avoid overfitting as much as possible, a multi-branch CNN-Block is designed and used to construct a CNN backbone with simplicity and accessibility to extract features. (3) The performance of the CNN-Swin model is validated on the open FGSC-23 dataset and a dataset containing typical military ship categories based on open-source images. The results show that the model achieved accuracies of 90.9% and 91.9% for the FGSC-23 dataset and the military ship dataset, respectively, outperforming the existing nine state-of-the-art approaches. (4) The good extraction effect on the ship features of the CNN-Swin model is validated as the backbone of the three state-of-the-art detection methods on the open datasets HRSC2016 and FAIR1M. The results show the great potential of the CNN-Swin backbone with self-attention in ship detection.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] CNN-Transformer with Stepped Distillation for Fine-Grained Visual Classification
    Xu, Qin
    Liu, Peng
    Wang, Jiahui
    Huang, Lili
    Tang, Jin
    PATTERN RECOGNITION AND COMPUTER VISION, PT IX, PRCV 2024, 2025, 15039 : 364 - 377
  • [2] Fine-Grained Image Classification Combining Swin and Multi-Scale Feature Fusion
    Xiang, Jianwen
    Chen, Minrong
    Yang, Baibing
    Computer Engineering and Applications, 2023, 59 (20): : 147 - 157
  • [3] SwinFG: A fine-grained recognition scheme based on swin transformer
    Ma, Zhipeng
    Wu, Xiaoyu
    Chu, Anzhuo
    Huang, Lei
    Wei, Zhiqiang
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244
  • [4] Global-local feature learning for fine-grained food classification based on Swin Transformer
    Kim, Jun-Hwa
    Kim, Namho
    Won, Chee Sun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [5] Swin-CFNet: An Attempt at Fine-Grained Urban Green Space Classification Using Swin Transformer and Convolutional Neural Network
    Wu, Yehong
    Zhang, Meng
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21
  • [6] SFRSwin: A Shallow Significant Feature Retention Swin Transformer for Fine-Grained Image Classification of Wildlife Species
    Wang, Shuai
    Han, Yubing
    Song, Shouliang
    Zhu, Honglei
    Zhang, Li
    Dong, Anming
    Yu, Jiguo
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT IX, 2024, 14433 : 232 - 243
  • [7] Survey of Vision Transformer in Fine-Grained Image Classification
    Sun, Lulu
    Liu, Jianping
    Wang, Jian
    Xing, Jialu
    Zhang, Yue
    Wang, Chenyang
    Computer Engineering and Applications, 60 (10): : 30 - 46
  • [8] LR-CNN FOR FINE-GRAINED CLASSIFICATION WITH VARYING RESOLUTION
    Chevalier, M.
    Thome, N.
    Cord, M.
    Fournier, J.
    Henaff, G.
    Dusch, E.
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 3101 - 3105
  • [9] Fine-Grained Intoxicated Gait Classification Using a Bilinear CNN
    Li, Ruojun
    Agu, Emmanuel
    Sarwar, Atifa
    Grimone, Kristin
    Herman, Debra
    Abrantes, Ana M.
    Stein, Michael D.
    IEEE SENSORS JOURNAL, 2023, 23 (23) : 29733 - 29748
  • [10] Dual Transformer With Multi-Grained Assembly for Fine-Grained Visual Classification
    Ji, Ruyi
    Li, Jiaying
    Zhang, Libo
    Liu, Jing
    Wu, Yanjun
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 5009 - 5021