Transformer With Linear-Window Attention for Feature Matching

被引:0
|
作者
Shen, Zhiwei [1 ,2 ]
Kong, Bin [1 ,3 ,4 ]
Dong, Xiaoyu [1 ,2 ]
机构
[1] Chinese Acad Sci, Hefei Inst Intelligent Machines, Hefei 230031, Peoples R China
[2] Univ Sci & Technol China, Hefei Inst Phys Sci, Hefei 230026, Peoples R China
[3] Anhui Engn Lab Intelligent Driving Technol & Appli, Hefei 230088, Peoples R China
[4] Chinese Acad Sci, Innovat Res Inst Robot & Intelligent Mfg Hefei, Hefei 230088, Peoples R China
关键词
Feature extraction; Transformers; Task analysis; Computational modeling; Computational efficiency; Memory management; Visualization; Feature matching; visual transformer; detector-free; computational complexity; low-texture;
D O I
10.1109/ACCESS.2023.3328855
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A transformer can capture long-term dependencies through an attention mechanism, and hence, can be applied to various vision tasks. However, its secondary computational complexity is a major obstacle in vision tasks that require accurate predictions. To address this limitation, this study introduces linear-window attention (LWA), a new attention model for a vision transformer. The transformer computes self-attention that is restricted to nonoverlapping local windows and represented as a linear dot product of kernel feature mappings. Furthermore, the computational complexity of each window is reduced to linear from quadratic using the constraint property of matrix products. In addition, we applied the LWA to feature matching to construct a coarse-to-fine-level detector-free feature matching method, called transformer with linear-window attention for feature matching TRLWAM. At the coarse level, we extracted the dense pixel-level matches, and at the fine level, we obtained the final matching results via multi-head multilayer perceptron refinement. We demonstrated the effectiveness of LWA through Replace experiments. The results showed that the TRLWAM could extract dense matches from low-texture or repetitive pattern regions in indoor environments, and exhibited excellent results with a low computational cost for MegaDepth and HPatches datasets. We believe the proposed LWA can provide new conceptions for transformer applications in visual tasks.
引用
收藏
页码:121202 / 121211
页数:10
相关论文
共 50 条
  • [31] PARAMETER-EFFICIENT VISION TRANSFORMER WITH LINEAR ATTENTION
    Zhao, Youpeng
    Tang, Huadong
    Jiang, Yingying
    Yong, A.
    Wu, Qiang
    Wang, Jun
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 1275 - 1279
  • [32] ScatterFormer: Efficient Voxel Transformer with Scattered Linear Attention
    He, Chenhang
    Li, Ruihuang
    Zhang, Guowen
    Zhang, Lei
    COMPUTER VISION - ECCV 2024, PT XXIX, 2025, 15087 : 74 - 92
  • [33] Screening Method for Feature Matching Based on Dynamic Window Motion Statistics
    Xiang H.
    Zhou L.
    Ba X.
    Chen J.
    Huanan Ligong Daxue Xuebao/Journal of South China University of Technology (Natural Science), 2020, 48 (06): : 114 - 122
  • [34] Fast Image Matching Based on Channel Attention and Feature Slicing
    Gai Shaoyan
    Huang Yanyan
    Da Feipeng
    ACTA OPTICA SINICA, 2023, 43 (22)
  • [35] Transformer-Based Local Feature Matching for Multimodal Image Registration
    Delaunay, Remi
    Zhang, Ruisi
    Pedrosa, Filipe C.
    Feizi, Navid
    Sacco, Dianne
    Patel, Rajni
    Jagadeesan, Jayender
    MEDICAL IMAGING 2024: IMAGE PROCESSING, 2024, 12926
  • [36] Adaptive Spot-Guided Transformer for Consistent Local Feature Matching
    Yu, Jiahuan
    Chang, Jiahao
    He, Jianfeng
    Zhang, Tianzhu
    Yu, Jiyang
    Wu, Feng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 21898 - 21908
  • [37] PT-Net: Pyramid Transformer Network for Feature Matching Learning
    Gong, Zhepeng
    Xiao, Guobao
    Shi, Ziwei
    Wang, Shiping
    Chen, Riqing
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73 : 1 - 11
  • [38] Hierarchical Feature Aggregation Based on Transformer for Image-Text Matching
    Dong, Xinfeng
    Zhang, Huaxiang
    Zhu, Lei
    Nie, Liqiang
    Liu, Li
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (09) : 6437 - 6447
  • [39] Efficient Linear Attention for Fast and Accurate Keypoint Matching
    Suwanwimolkul, Suwichaya
    Komorita, Satoshi
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 330 - 341
  • [40] RESwinT: enhanced pollen image classification with parallel window transformer and coordinate attention
    Zu, Baokai
    Cao, Tong
    Li, Yafang
    Li, Jianqiang
    Wang, Hongyuan
    Wang, Quanzeng
    VISUAL COMPUTER, 2024,