An enhanced vision transformer with wavelet position embedding for histopathological image classification

被引:14
|
作者
Ding, Meidan [1 ]
Qu, Aiping [1 ]
Zhong, Haiqin [1 ]
Lai, Zhihui [2 ,3 ]
Xiao, Shuomin [1 ]
He, Penghui [1 ]
机构
[1] Univ South China, Sch Comp, Hengyang 421001, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[3] Robot Soc, Shenzhen Inst Artificial Intelligence, Shenzhen 518129, Peoples R China
关键词
Histopathological image classification; Vision transformer; Convolutional neural network; Wavelet position embedding; External multi-head attention;
D O I
10.1016/j.patcog.2023.109532
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Histopathological image classification is a fundamental task in pathological diagnosis workflow. It remains a huge challenge due to the complexity of histopathological images. Recently, hybrid methods combin-ing convolutional neural networks(CNN) with vision transformers(ViT) are proposed to this field. These methods can well represent the global and local contextual information and achieve excellent classifica-tion performances. However, the downsampling operation like max-pooling which ignores the sampling theorem transmits the jagged artifacts into transformer, which would lead to an aliasing phenomenon. It makes the subsequent feature maps focus on the incorrect regions and influences the final classifica-tion results. In this work, we propose an enhanced vision transformer with wavelet position embedding to tackle this challenge. In particular, a wavelet position embedding module, which introduces the wave transform into position embedding, is employed to enhance the smoothness of discontinuous feature in-formation by decomposing sequences into amplitude and phase in pathological feature maps. In addition, an external multi-head attention is proposed to replace self-attention in the transformer block with two linear layers. It reduces the cost of computation and excavates potential correlations between different samples. We evaluate the proposed method on three public histopathological classification challenging datasets, and perform a quantitative comparison with previous state-of-the-art methods. The results em-pirically demonstrate that our method achieves the best accuracy. Furthermore, it has the least param-eters and a very low FLOPs. In conclusion, the enhanced vision transformer shows high classification performances and demonstrates significant potential for assisting pathologists in pathological diagnosis.(c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] MIL-VT: Multiple Instance Learning Enhanced Vision Transformer for Fundus Image Classification
    Yu, Shuang
    Ma, Kai
    Bi, Qi
    Bian, Cheng
    Ning, Munan
    He, Nanjun
    Li, Yuexiang
    Liu, Hanruo
    Zheng, Yefeng
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VIII, 2021, 12908 : 45 - 54
  • [22] Hyperspectral Image Classification Using SpectralSpatial Token Enhanced Transformer With Hash-Based Positional Embedding
    Wu, Ke
    Fan, Jiayuan
    Ye, Peng
    Zhu, Mingzhen
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2023, 61
  • [23] Transformer-based unsupervised contrastive learning for histopathological image classification
    Wang, Xiyue
    Yang, Sen
    Zhang, Jun
    Wang, Minghui
    Zhang, Jing
    Yang, Wei
    Huang, Junzhou
    Han, Xiao
    MEDICAL IMAGE ANALYSIS, 2022, 81
  • [24] Privacy-Preserving Image Classification Using Vision Transformer
    Qi, Zheng
    MaungMaung, AprilPyone
    Kinoshita, Yuma
    Kiya, Hitoshi
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 543 - 547
  • [25] Vision Transformer with window sequence merging mechanism for image classification
    Jiao, Erjie
    Leng, Qiangkui
    Guo, Jiamei
    Meng, Xiangfu
    Wang, Changzhong
    APPLIED SOFT COMPUTING, 2025, 171
  • [26] Survey of Vision Transformer in Fine-Grained Image Classification
    Sun, Lulu
    Liu, Jianping
    Wang, Jian
    Xing, Jialu
    Zhang, Yue
    Wang, Chenyang
    Computer Engineering and Applications, 60 (10): : 30 - 46
  • [27] Hierarchical Pretrained Backbone Vision Transformer for Image Classification in Histopathology
    Zedda, Luca
    Loddo, Andrea
    Di Ruberto, Cecilia
    IMAGE ANALYSIS AND PROCESSING, ICIAP 2023, PT II, 2023, 14234 : 223 - 234
  • [28] MedViT: A robust vision transformer for generalized medical image classification
    Manzari, Omid Nejati
    Ahmadabadi, Hamid
    Kashiani, Hossein
    Shokouhi, Shahriar B.
    Ayatollahi, Ahmad
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 157
  • [29] Hash Food Image Retrieval Based on Enhanced Vision Transformer
    Cao P.
    Min W.
    Song J.
    Sheng G.
    Yang Y.
    Wang L.
    Jiang S.
    Shipin Kexue/Food Science, 2024, 45 (10): : 1 - 8
  • [30] FSwin Transformer: Feature-Space Window Attention Vision Transformer for Image Classification
    Yoo, Dayeon
    Kim, Jeesu
    Yoo, Jinwoo
    IEEE ACCESS, 2024, 12 : 72598 - 72606