An enhanced vision transformer with wavelet position embedding for histopathological image classification

被引:14
|
作者
Ding, Meidan [1 ]
Qu, Aiping [1 ]
Zhong, Haiqin [1 ]
Lai, Zhihui [2 ,3 ]
Xiao, Shuomin [1 ]
He, Penghui [1 ]
机构
[1] Univ South China, Sch Comp, Hengyang 421001, Peoples R China
[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[3] Robot Soc, Shenzhen Inst Artificial Intelligence, Shenzhen 518129, Peoples R China
关键词
Histopathological image classification; Vision transformer; Convolutional neural network; Wavelet position embedding; External multi-head attention;
D O I
10.1016/j.patcog.2023.109532
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Histopathological image classification is a fundamental task in pathological diagnosis workflow. It remains a huge challenge due to the complexity of histopathological images. Recently, hybrid methods combin-ing convolutional neural networks(CNN) with vision transformers(ViT) are proposed to this field. These methods can well represent the global and local contextual information and achieve excellent classifica-tion performances. However, the downsampling operation like max-pooling which ignores the sampling theorem transmits the jagged artifacts into transformer, which would lead to an aliasing phenomenon. It makes the subsequent feature maps focus on the incorrect regions and influences the final classifica-tion results. In this work, we propose an enhanced vision transformer with wavelet position embedding to tackle this challenge. In particular, a wavelet position embedding module, which introduces the wave transform into position embedding, is employed to enhance the smoothness of discontinuous feature in-formation by decomposing sequences into amplitude and phase in pathological feature maps. In addition, an external multi-head attention is proposed to replace self-attention in the transformer block with two linear layers. It reduces the cost of computation and excavates potential correlations between different samples. We evaluate the proposed method on three public histopathological classification challenging datasets, and perform a quantitative comparison with previous state-of-the-art methods. The results em-pirically demonstrate that our method achieves the best accuracy. Furthermore, it has the least param-eters and a very low FLOPs. In conclusion, the enhanced vision transformer shows high classification performances and demonstrates significant potential for assisting pathologists in pathological diagnosis.(c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Vision Transformer With Hybrid Shifted Windows for Gastrointestinal Endoscopy Image Classification
    Wang, Wei
    Yang, Xin
    Tang, Jinhui
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4452 - 4461
  • [42] Transformer with convolution and graph-node co-embedding: An accurate and interpretable vision backbone for predicting gene expressions from local histopathological image
    Xiao, Xiao
    Kong, Yan
    Li, Ronghan
    Wang, Zuoheng
    Lu, Hui
    MEDICAL IMAGE ANALYSIS, 2024, 91
  • [43] Image Dehazing Transformer with Transmission-Aware 3D Position Embedding
    Guo, Chunle
    Yan, Qixin
    Anwar, Saeed
    Cong, Runmin
    Ren, Wenqi
    Li, Chongyi
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5802 - 5810
  • [44] Vision-Enhanced and Consensus-Aware Transformer for Image Captioning
    Cao, Shan
    An, Gaoyun
    Zheng, Zhenxing
    Wang, Zhiyong
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (10) : 7005 - 7018
  • [45] Vision Transformer with pre-positional embedding
    Eguchi, Takuro
    Kuroki, Yoshimitsu
    INTERNATIONAL WORKSHOP ON ADVANCED IMAGING TECHNOLOGY, IWAIT 2024, 2024, 13164
  • [46] Breast cancer histopathology image classification using transformer with discrete wavelet transform
    Yan, Yuting
    Lu, Ruidong
    Sun, Jian
    Zhang, Jianxin
    Zhang, Qiang
    MEDICAL ENGINEERING & PHYSICS, 2025, 138
  • [47] Rethinking Position Embedding Methods in the Transformer Architecture
    Xin Zhou
    Zhaohui Ren
    Shihua Zhou
    Zeyu Jiang
    TianZhuang Yu
    Hengfa Luo
    Neural Processing Letters, 56
  • [48] Rethinking Position Embedding Methods in the Transformer Architecture
    Zhou, Xin
    Ren, Zhaohui
    Zhou, Shihua
    Jiang, Zeyu
    Yu, Tianzhuang
    Luo, Hengfa
    NEURAL PROCESSING LETTERS, 2024, 56 (02)
  • [49] TransPath: Transformer-Based Self-supervised Learning for Histopathological Image Classification
    Wang, Xiyue
    Yang, Sen
    Zhang, Jun
    Wang, Minghui
    Zhang, Jing
    Huang, Junzhou
    Yang, Wei
    Han, Xiao
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VIII, 2021, 12908 : 186 - 195
  • [50] DACTransNet: A Hybrid CNN-Transformer Network for Histopathological Image Classification of Pancreatic Cancer
    Kou, Yongqing
    Xia, Cong
    Jiao, Yiping
    Zhang, Daoqiang
    Ge, Rongjun
    ARTIFICIAL INTELLIGENCE, CICAI 2023, PT II, 2024, 14474 : 422 - 434