Transformer With Linear-Window Attention for Feature Matching

被引:0
|
作者
Shen, Zhiwei [1 ,2 ]
Kong, Bin [1 ,3 ,4 ]
Dong, Xiaoyu [1 ,2 ]
机构
[1] Chinese Acad Sci, Hefei Inst Intelligent Machines, Hefei 230031, Peoples R China
[2] Univ Sci & Technol China, Hefei Inst Phys Sci, Hefei 230026, Peoples R China
[3] Anhui Engn Lab Intelligent Driving Technol & Appli, Hefei 230088, Peoples R China
[4] Chinese Acad Sci, Innovat Res Inst Robot & Intelligent Mfg Hefei, Hefei 230088, Peoples R China
关键词
Feature extraction; Transformers; Task analysis; Computational modeling; Computational efficiency; Memory management; Visualization; Feature matching; visual transformer; detector-free; computational complexity; low-texture;
D O I
10.1109/ACCESS.2023.3328855
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A transformer can capture long-term dependencies through an attention mechanism, and hence, can be applied to various vision tasks. However, its secondary computational complexity is a major obstacle in vision tasks that require accurate predictions. To address this limitation, this study introduces linear-window attention (LWA), a new attention model for a vision transformer. The transformer computes self-attention that is restricted to nonoverlapping local windows and represented as a linear dot product of kernel feature mappings. Furthermore, the computational complexity of each window is reduced to linear from quadratic using the constraint property of matrix products. In addition, we applied the LWA to feature matching to construct a coarse-to-fine-level detector-free feature matching method, called transformer with linear-window attention for feature matching TRLWAM. At the coarse level, we extracted the dense pixel-level matches, and at the fine level, we obtained the final matching results via multi-head multilayer perceptron refinement. We demonstrated the effectiveness of LWA through Replace experiments. The results showed that the TRLWAM could extract dense matches from low-texture or repetitive pattern regions in indoor environments, and exhibited excellent results with a low computational cost for MegaDepth and HPatches datasets. We believe the proposed LWA can provide new conceptions for transformer applications in visual tasks.
引用
收藏
页码:121202 / 121211
页数:10
相关论文
共 50 条
  • [11] NCTR: NEIGHBORHOOD CONSENSUS TRANSFORMER FOR FEATURE MATCHING
    Lu, Xiaoyong
    Du, Songlin
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 2726 - 2730
  • [12] A window attention based Transformer for Automatic Speech Recognition
    Feng, Zhao
    Li, Yongming
    2024 5TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATION, ICCEA 2024, 2024, : 449 - 454
  • [13] CONMW TRANSFORMER: A GENERAL VISION TRANSFORMER BACKBONE WITH MERGED-WINDOW ATTENTION
    Li, Ang
    Jiao, Jichao
    Li, Ning
    Qi, Wangjing
    Xu, Wei
    Pang, Min
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1551 - 1555
  • [14] FLatten Transformer: Vision Transformer using Focused Linear Attention
    Han, Dongchen
    Pan, Xuran
    Han, Yizeng
    Song, Shiji
    Huang, Gao
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 5938 - 5948
  • [15] MatchFormer: Interleaving Attention in Transformers for Feature Matching
    Wang, Qing
    Zhang, Jiaming
    Yang, Kailun
    Peng, Kunyu
    Stiefelhagen, Rainer
    COMPUTER VISION - ACCV 2022, PT III, 2023, 13843 : 256 - 273
  • [16] ResMatch: Residual Attention Learning for Feature Matching
    Deng, Yuxin
    Zhang, Kaining
    Zhang, Shihua
    Li, Yansheng
    Ma, Jiayi
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 2, 2024, : 1501 - 1509
  • [17] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
    Liu, Zhijian
    Yang, Xinyu
    Tang, Haotian
    Yang, Shang
    Han, Song
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1200 - 1211
  • [18] Local Window Attention Transformer for Polarimetric SAR Image Classification
    Jamali, Ali
    Roy, Swalpa Kumar
    Bhattacharya, Avik
    Ghamisi, Pedram
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2023, 20
  • [19] Efficient Feature Interactions Learning with Gated Attention Transformer
    Long, Chao
    Zhu, Yanmin
    Liu, Haobing
    Yu, Jiadi
    WEB INFORMATION SYSTEMS ENGINEERING - WISE 2021, PT II, 2021, 13081 : 3 - 17
  • [20] FmCFA: a feature matching method for critical feature attention in multimodal images
    Liao, Yun
    Wu, Xuning
    Liu, Junhui
    Liu, Peiyu
    Pan, Zhixuan
    Duan, Qing
    SCIENTIFIC REPORTS, 2025, 15 (01):