Spatiotemporal Interaction Transformer Network for Video-Based Person Reidentification in Internet of Things

被引:3
|
作者
Yang, Fan [1 ]
Li, Wei [2 ,3 ]
Liang, Binbin [2 ]
Zhang, Jianwei [1 ]
机构
[1] Sichuan Univ, Coll Comp Sci, Chengdu 610065, Peoples R China
[2] Sichuan Univ, Sch Aeronaut & Astronaut, Chengdu 610065, Peoples R China
[3] Beijing Inst Technol, State Key Lab Explos Sci & Technol, Beijing 100081, Peoples R China
关键词
Internet of Things; local feature; person reidentification (Re-ID); spatiotemporal interaction; REPRESENTATION; ATTENTION; APPEARANCE;
D O I
10.1109/JIOT.2023.3250652
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Video-based person reidentification, which is a significant application in the Internet of Things, aims to identify the same person in different video sequences across nonoverlapping cameras. Existing methods usually utilize temporal cues to enhance spatial features. However, these methods learn the temporal and spatial information separately, which breaks the relationship between them and ignores the positive role of temporal information for learning frame-level spatial representation in the process of spatial representation learning. In this article, we propose a novel spatiotemporal interaction transformer network (SITN) to solve this problem. To model the temporal information and the relationship between frames, we introduce a temporal interaction module (TIM) to interact between frame information. Meanwhile, we combine TIM with spatial transformer encoder to explore the positive role of temporal information in the learning procedure of the frame-level spatial feature. Moreover, we propose a transformer local learning scheme by reconstructing the 2-D spatial information of the frame patch sequences and extracting local features in a striped manner to strengthen the discriminative capability of our model. Extensive experiments are conducted on four public benchmarks. The results show that our model is superior compared with state-of-the-art methods.
引用
收藏
页码:12537 / 12547
页数:11
相关论文
共 50 条
  • [1] Exciting-Inhibition Network for Person Reidentification in Internet of Things
    Fu, Meixia
    Sun, Songlin
    Liang, Qilian
    Tong, Xiaoyun
    Liu, Qiang
    IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (20) : 15059 - 15069
  • [2] CARF-Net: CNN attention and RNN fusion network for video-based person reidentification
    Kansal, Kajal
    Venkata, Subramanyam
    Prasad, Dilip K.
    Kankanhalli, Mohan
    JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (02)
  • [3] Improving Person Reidentification Using a Self-Focusing Network in Internet of Things
    Fu, Meixia
    Sun, Songlin
    Gao, Hui
    Wang, Danshi
    Tong, Xiaoyun
    Liu, Qiang
    Liang, Qilian
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (12) : 9342 - 9353
  • [4] Tracking Algorithm Based on Video Person Reidentification and Spatiotemporal Feature Fusion
    Hui Guancheng
    Li Kaifang
    Xin Ming
    Zhang Miaohui
    LASER & OPTOELECTRONICS PROGRESS, 2022, 59 (12)
  • [5] A Duplex Spatiotemporal Filtering Network for Video-based Person Re-identification
    Zheng, Chong
    Wei, Ping
    Zheng, Nanning
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7551 - 7557
  • [6] Spatiotemporal Learning Transformer for Video-Based Human Pose Estimation
    Gai, Di
    Feng, Runyang
    Min, Weidong
    Yang, Xiaosong
    Su, Pengxiang
    Wang, Qi
    Han, Qing
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4564 - 4576
  • [7] Local and global aligned spatiotemporal attention network for video-based person re-identification
    Cheng, Li
    Jing, Xiao-Yuan
    Zhu, Xiaoke
    Hu, Chang-Hui
    Gao, Guangwei
    Wu, Songsong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (45-46) : 34489 - 34512
  • [8] Local and global aligned spatiotemporal attention network for video-based person re-identification
    Li Cheng
    Xiao-Yuan Jing
    Xiaoke Zhu
    Chang-Hui Hu
    Guangwei Gao
    Songsong Wu
    Multimedia Tools and Applications, 2020, 79 : 34489 - 34512
  • [9] Progressive learning in cross-modal cross-scale fusion transformer for visible-infrared video-based person reidentification
    Mukhtar, Hamza
    Mukhtar, Umar Raza
    KNOWLEDGE-BASED SYSTEMS, 2024, 304
  • [10] 3-D PersonVLAD: Learning Deep Global Representations for Video-Based Person Reidentification
    Wu, Lin
    Wang, Yang
    Shao, Ling
    Wang, Meng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (11) : 3347 - 3359