Temporal Correlation Vision Transformer for Video Person Re-Identification

被引:0
|
作者
Wu, Pengfei [1 ,2 ]
Wang, Le [1 ,2 ]
Zhou, Sanping [1 ,2 ]
Hua, Gang [4 ]
Sun, Changyin [3 ]
机构
[1] Xi An Jiao Tong Univ, Natl Engn Res Ctr Visual Informat & Applicat, Natl Key Lab Human Machine Hybrid Augmented Intel, Xian, Peoples R China
[2] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China
[3] Anhui Univ, Sch Artificial Intelligence, Hefei, Peoples R China
[4] Wormpex AI Res, Bellevue, WA USA
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video Person Re-Identification (Re-ID) is a task of retrieving persons from multi-camera surveillance systems. Despite the progress made in leveraging spatio-temporal information in videos, occlusion in dense crowds still hinders further progress. To address this issue, we propose a Temporal Correlation Vision Transformer (TCViT) for video person Re-ID. TCViT consists of a Temporal Correlation Attention (TCA) module and a Learnable Temporal Aggregation (LTA) module. The TCA module is designed to reduce the impact of non-target persons by relative state, while the LTA module is used to aggregate frame-level features based on their completeness. Specifically, TCA is a parameter-free module that first aligns frame-level features to restore semantic coherence in videos and then enhances the features of the target person according to temporal correlation. Additionally, unlike previous methods that treat each frame equally with a pooling layer, LTA introduces a lightweight learnable module to weigh and aggregate frame-level features under the guidance of a classification score. Extensive experiments on four prevalent benchmarks demonstrate that our method achieves state-of-the-art performance in video Re-ID.
引用
收藏
页码:6083 / 6091
页数:9
相关论文
共 50 条
  • [21] Completed Part Transformer for Person Re-Identification
    Zhang, Zhong
    He, Di
    Liu, Shuang
    Xiao, Baihua
    Durrani, Tariq S.
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2303 - 2313
  • [22] Progressive unsupervised video person re-identification with accumulative motion and tracklet spatial-temporal correlation
    Yang, Yuanfeng
    Li, Lin
    Dong, Husheng
    Liu, Gang
    Sun, Xun
    Liu, Zhaobin
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2023, 142 : 90 - 100
  • [23] Ubiquitous vision of transformers for person re-identification
    Perwaiz, N.
    Shahzad, M.
    Fraz, M. M.
    MACHINE VISION AND APPLICATIONS, 2023, 34 (02)
  • [24] Ubiquitous vision of transformers for person re-identification
    N. Perwaiz
    M. Shahzad
    M. M. Fraz
    Machine Vision and Applications, 2023, 34
  • [25] A simple but effective vision transformer framework for visible-infrared person re-identification
    Li, Yudong
    Zhao, Sanyuan
    Shen, Jianbing
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 249
  • [26] COMPLEX SPATIAL-TEMPORAL ATTENTION AGGREGATION FOR VIDEO PERSON RE-IDENTIFICATION
    Ding, Wenjie
    Wei, Xing
    Hong, Xiaopeng
    Gong, Yihong
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2441 - 2445
  • [27] Person re-identification by unsupervised video matching
    Ma, Xiaolong
    Zhu, Xiatian
    Gong, Shaogang
    Xie, Xudong
    Hu, Jianming
    Lam, Kin-Man
    Zhong, Yisheng
    PATTERN RECOGNITION, 2017, 65 : 197 - 210
  • [28] Temporal Extension Topology Learning for Video-Based Person Re-identification
    Ning, Jiaqi
    Li, Fei
    Liu, Rujie
    Takeuchi, Shun
    Suzuki, Genta
    COMPUTER VISION - ACCV 2022 WORKSHOPS, 2023, 13848 : 213 - 225
  • [29] Multi-Scale Temporal Cues Learning for Video Person Re-Identification
    Li, Jianing
    Zhang, Shiliang
    Huang, Tiejun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 4461 - 4473
  • [30] CONVOLUTIONAL TEMPORAL ATTENTION MODEL FOR VIDEO-BASED PERSON RE-IDENTIFICATION
    Rahman, Tanzila
    Rochan, Mrigank
    Wang, Yang
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1102 - 1107