Multi-scale feature extraction and fusion with attention interaction for RGB-T

被引：1

作者：

Xing, Haijiao ^{[1
]}

Wei, Wei ^{[1
]}

Zhang, Lei ^{[1
]}

Zhang, Yanning ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China

来源：

PATTERN RECOGNITION | 2025年 / 157卷

基金：

中国国家自然科学基金;

关键词：

Single-object tracking; RGB-T tracking; Feature fusion; SIAMESE NETWORKS; TRACKING;

D O I：

10.1016/j.patcog.2024.110917

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

RGB-T single-object tracking aims to track objects utilizing both RGB images and thermal infrared(TIR) images. Though the siamese-based RGB-T tracker shows its advantage in tracking speed, its accuracy still cannot be compared with other state-of-the-art trackers (e.g., MDNet). In this study, we revisit the existing siamese-based RGB-T tracker and find that such fall behind comes from insufficient feature fusion between RGB image and TIR image, as well as incomplete interactions between template frame and search frame. Inspired by this, we propose a multi-scale feature extraction and fusion network with Temporal-Spatial Memory (MFATrack). Instead of fusing RGB image and TIR image with the single-scale feature map or only high-level features from the multi-scale feature map, MFATrack proposes a new fusion strategy by fusing features from all scales, which can capture contextual information in shallow layers and details in the deep layer. To learn the feature better for tracking tasks, MFATrack fuses the features via several consecutive frames. In addition, we also propose a self-attention interaction module specifically designed for the search frame, highlighting the features in the search frame that are relevant to the target and thus facilitating rapid convergence for target localization. Experimental results demonstrate the proposed MFATrack is not only fast, but also can obtain better tracking accuracy compared with other competing methods including MDNet-based methods and other siamese-based trackers.

引用

页数：12

共 50 条

[31] Hierarchical Feature Fusion With Text Attention For Multi-scale Text Detection
Liu, Chao
Zou, Yuexian
Guan, Wenjie
2018 IEEE 23RD INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING (DSP), 2018,
[32] Adaptive feature fusion with attention mechanism for multi-scale target detection
Ju, Moran
Luo, Jiangning
Wang, Zhongbo
Luo, Haibo
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (07): : 2769 - 2781
[33] Multi-Scale Feature Fusion Network with Attention for Single Image Dehazing
Pattern Recognition and Image Analysis, 2021, 31 : 608 - 615
[34] Integrating attention mechanism and multi-scale feature extraction for fall detection
Chen, Hao
Gu, Wenye
Zhang, Qiong
Li, Xiujing
Jiang, Xiaojing
HELIYON, 2024, 10 (10)
[35] Multi-Scale Feature Extraction Method of Hyperspectral Image with Attention Mechanism
Xu Zhangchi
Guo Baofeng
Wu Wenhao
You Jingyun
Su Xiaotong
LASER & OPTOELECTRONICS PROGRESS, 2024, 61 (04)
[36] MFCNet: Multimodal Feature Fusion Network for RGB-T Vehicle Density Estimation
Qin, Ling-Xiao
Sun, Hong-Mei
Duan, Xiao-Meng
Che, Cheng-Yue
Jia, Rui-Sheng
IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (04): : 4207 - 4219
[37] A multi-scale feature extraction fusion model for human activity recognition
Zhang, Chuanlin
Cao, Kai
Lu, Limeng
Deng, Tao
SCIENTIFIC REPORTS, 2022, 12 (01)
[38] A multi-scale feature extraction fusion model for human activity recognition
Chuanlin Zhang
Kai Cao
Limeng Lu
Tao Deng
Scientific Reports, 12
[39] Lightweight road extraction model based on multi-scale feature fusion
Liu Y.
Chen Y.
Gao L.
Hong J.
Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2024, 58 (05): : 951 - 959
[40] Co-Saliency Detection Based on Multi-Scale Feature Extraction and Feature Fusion
Zuo, Kuangji
Liang, Huiqing
Wang, Dechen
Zhang, Dehua
2022 4TH INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTICS, ICCR, 2022, : 364 - 368

← 1 2 3 4 5 →