Spatiotemporal Interaction Transformer Network for Video-Based Person Reidentification in Internet of Things

被引：3

作者：

Yang, Fan ^{[1
]}

Li, Wei ^{[2
,3
]}

Liang, Binbin ^{[2
]}

Zhang, Jianwei ^{[1
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

[2] Sichuan Univ, Sch Aeronaut & Astronaut, Chengdu 610065, Peoples R China

[3] Beijing Inst Technol, State Key Lab Explos Sci & Technol, Beijing 100081, Peoples R China

来源：

IEEE INTERNET OF THINGS JOURNAL | 2023年 / 10卷 / 14期

关键词：

Internet of Things; local feature; person reidentification (Re-ID); spatiotemporal interaction; REPRESENTATION; ATTENTION; APPEARANCE;

D O I：

10.1109/JIOT.2023.3250652

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video-based person reidentification, which is a significant application in the Internet of Things, aims to identify the same person in different video sequences across nonoverlapping cameras. Existing methods usually utilize temporal cues to enhance spatial features. However, these methods learn the temporal and spatial information separately, which breaks the relationship between them and ignores the positive role of temporal information for learning frame-level spatial representation in the process of spatial representation learning. In this article, we propose a novel spatiotemporal interaction transformer network (SITN) to solve this problem. To model the temporal information and the relationship between frames, we introduce a temporal interaction module (TIM) to interact between frame information. Meanwhile, we combine TIM with spatial transformer encoder to explore the positive role of temporal information in the learning procedure of the frame-level spatial feature. Moreover, we propose a transformer local learning scheme by reconstructing the 2-D spatial information of the frame patch sequences and extracting local features in a striped manner to strengthen the discriminative capability of our model. Extensive experiments are conducted on four public benchmarks. The results show that our model is superior compared with state-of-the-art methods.

引用

页码：12537 / 12547

页数：11

共 50 条

[1] Exciting-Inhibition Network for Person Reidentification in Internet of Things
Fu, Meixia
Sun, Songlin
Liang, Qilian
Tong, Xiaoyun
Liu, Qiang
IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (20) : 15059 - 15069
[2] CARF-Net: CNN attention and RNN fusion network for video-based person reidentification
Kansal, Kajal
Venkata, Subramanyam
Prasad, Dilip K.
Kankanhalli, Mohan
JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (02)
[3] Improving Person Reidentification Using a Self-Focusing Network in Internet of Things
Fu, Meixia
Sun, Songlin
Gao, Hui
Wang, Danshi
Tong, Xiaoyun
Liu, Qiang
Liang, Qilian
IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (12) : 9342 - 9353
[4] Tracking Algorithm Based on Video Person Reidentification and Spatiotemporal Feature Fusion
Hui Guancheng
Li Kaifang
Xin Ming
Zhang Miaohui
LASER & OPTOELECTRONICS PROGRESS, 2022, 59 (12)
[5] A Duplex Spatiotemporal Filtering Network for Video-based Person Re-identification
Zheng, Chong
Wei, Ping
Zheng, Nanning
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7551 - 7557
[6] Spatiotemporal Learning Transformer for Video-Based Human Pose Estimation
Gai, Di
Feng, Runyang
Min, Weidong
Yang, Xiaosong
Su, Pengxiang
Wang, Qi
Han, Qing
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4564 - 4576
[7] Local and global aligned spatiotemporal attention network for video-based person re-identification
Cheng, Li
Jing, Xiao-Yuan
Zhu, Xiaoke
Hu, Chang-Hui
Gao, Guangwei
Wu, Songsong
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 34489 - 34512
[8] Local and global aligned spatiotemporal attention network for video-based person re-identification
Li Cheng
Xiao-Yuan Jing
Xiaoke Zhu
Chang-Hui Hu
Guangwei Gao
Songsong Wu
Multimedia Tools and Applications, 2020, 79 : 34489 - 34512
[9] Progressive learning in cross-modal cross-scale fusion transformer for visible-infrared video-based person reidentification
Mukhtar, Hamza
Mukhtar, Umar Raza
KNOWLEDGE-BASED SYSTEMS, 2024, 304
[10] 3-D PersonVLAD: Learning Deep Global Representations for Video-Based Person Reidentification
Wu, Lin
Wang, Yang
Shao, Ling
Wang, Meng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (11) : 3347 - 3359

← 1 2 3 4 5 →