Spatiotemporal Interaction Transformer Network for Video-Based Person Reidentification in Internet of Things

被引：3

作者：

Yang, Fan ^{[1
]}

Li, Wei ^{[2
,3
]}

Liang, Binbin ^{[2
]}

Zhang, Jianwei ^{[1
]}

机构：

[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China

[2] Sichuan Univ, Sch Aeronaut & Astronaut, Chengdu 610065, Peoples R China

[3] Beijing Inst Technol, State Key Lab Explos Sci & Technol, Beijing 100081, Peoples R China

来源：

IEEE INTERNET OF THINGS JOURNAL | 2023年 / 10卷 / 14期

关键词：

Internet of Things; local feature; person reidentification (Re-ID); spatiotemporal interaction; REPRESENTATION; ATTENTION; APPEARANCE;

D O I：

10.1109/JIOT.2023.3250652

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Video-based person reidentification, which is a significant application in the Internet of Things, aims to identify the same person in different video sequences across nonoverlapping cameras. Existing methods usually utilize temporal cues to enhance spatial features. However, these methods learn the temporal and spatial information separately, which breaks the relationship between them and ignores the positive role of temporal information for learning frame-level spatial representation in the process of spatial representation learning. In this article, we propose a novel spatiotemporal interaction transformer network (SITN) to solve this problem. To model the temporal information and the relationship between frames, we introduce a temporal interaction module (TIM) to interact between frame information. Meanwhile, we combine TIM with spatial transformer encoder to explore the positive role of temporal information in the learning procedure of the frame-level spatial feature. Moreover, we propose a transformer local learning scheme by reconstructing the 2-D spatial information of the frame patch sequences and extracting local features in a striped manner to strengthen the discriminative capability of our model. Extensive experiments are conducted on four public benchmarks. The results show that our model is superior compared with state-of-the-art methods.

引用

页码：12537 / 12547

页数：11

共 50 条

[41] Multimodal Interaction Fusion Network Based on Transformer for Video Captioning
Xu, Hui
Zeng, Pengpeng
Khan, Abdullah Aman
ARTIFICIAL INTELLIGENCE AND ROBOTICS, ISAIR 2022, PT I, 2022, 1700 : 21 - 36
[42] Video-based person re-identification with complementary local and global features using a graph transformer
Lu, Hai
Luo, Enbo
Feng, Yong
Wang, Yifan
Mathematical Biosciences and Engineering, 2024, 21 (07): : 6694 - 6709
[43] Parallel Attention with Weighted Efficient Network for Video-Based Person Re-Identification
Yang, Junting
Yang, Zuliu
Zhou, Jing
Zhao, Yong
Dai, Qifei
Li, Fuchi
2021 5TH INTERNATIONAL CONFERENCE ON INNOVATION IN ARTIFICIAL INTELLIGENCE (ICIAI 2021), 2021, : 133 - 139
[44] Video-based person re-identification with scene and person attributes
Gong, Xun
Luo, Bin
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (03) : 8117 - 8128
[45] Temporal Attention Quality Aware Network for Video-based Person Re-Identification
Xu, Boqin
Liu, Changhong
Xue, Shengjun
Jiang, Aiwen
Wang, Shimin
Ye, Jihua
TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069
[46] Temporal-Contextual Attention Network for Video-Based Person Re-identification
Chen, Di
Zha, Zheng-Jun
Liu, Jiawei
Xie, Hongtao
Zhang, Yongdong
ADVANCES IN MULTIMEDIA INFORMATION PROCESSING, PT I, 2018, 11164 : 146 - 157
[47] An Efficient Axial-Attention Network for Video-Based Person Re-Identification
Zhang, Fuping
Zhang, Tianzhao
Sun, Ruoxi
Huang, Chao
Wei, Jianming
IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 1352 - 1356
[48] Spatial-temporal aware network for video-based person re-identification
Jun Wang
Qi Zhao
Di Jia
Ziqing Huang
Miaohui Zhang
Xing Ren
Multimedia Tools and Applications, 2024, 83 : 36355 - 36373
[49] Spatial temporal and channel aware network for video-based person re-identification
Fu, Hui
Zhang, Ke
Li, Haoyu
Wang, Jingyu
Wang, Zhen
IMAGE AND VISION COMPUTING, 2022, 118
[50] Video-based person re-identification with scene and person attributes
Xun Gong
Bin Luo
Multimedia Tools and Applications, 2024, 83 : 8117 - 8128

← 1 2 3 4 5 →