Temporal Correlation Vision Transformer for Video Person Re-Identification

被引:0
|
作者
Wu, Pengfei [1 ,2 ]
Wang, Le [1 ,2 ]
Zhou, Sanping [1 ,2 ]
Hua, Gang [4 ]
Sun, Changyin [3 ]
机构
[1] Xi An Jiao Tong Univ, Natl Engn Res Ctr Visual Informat & Applicat, Natl Key Lab Human Machine Hybrid Augmented Intel, Xian, Peoples R China
[2] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China
[3] Anhui Univ, Sch Artificial Intelligence, Hefei, Peoples R China
[4] Wormpex AI Res, Bellevue, WA USA
基金
国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video Person Re-Identification (Re-ID) is a task of retrieving persons from multi-camera surveillance systems. Despite the progress made in leveraging spatio-temporal information in videos, occlusion in dense crowds still hinders further progress. To address this issue, we propose a Temporal Correlation Vision Transformer (TCViT) for video person Re-ID. TCViT consists of a Temporal Correlation Attention (TCA) module and a Learnable Temporal Aggregation (LTA) module. The TCA module is designed to reduce the impact of non-target persons by relative state, while the LTA module is used to aggregate frame-level features based on their completeness. Specifically, TCA is a parameter-free module that first aligns frame-level features to restore semantic coherence in videos and then enhances the features of the target person according to temporal correlation. Additionally, unlike previous methods that treat each frame equally with a pooling layer, LTA introduces a lightweight learnable module to weigh and aggregate frame-level features under the guidance of a classification score. Extensive experiments on four prevalent benchmarks demonstrate that our method achieves state-of-the-art performance in video Re-ID.
引用
收藏
页码:6083 / 6091
页数:9
相关论文
共 50 条
  • [41] A Patch Information Supplement Transformer for Person Re-Identification
    Zhu, Li
    Jiang, Chenglong
    Wu, Minghu
    ELECTRONICS, 2023, 12 (09)
  • [42] NFormer: Robust Person Re-identification with Neighbor Transformer
    Wang, Haochen
    Shen, Jiayi
    Liu, Yongtuo
    Gao, Yan
    Gavves, Efstratios
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7287 - 7297
  • [43] Denseformer: A dense transformer framework for person re-identification
    Ma, Haoyan
    Li, Xiang
    Yuan, Xia
    Zhao, Chunxia
    IET COMPUTER VISION, 2023, 17 (05) : 527 - 536
  • [44] Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos
    Liu, Jiawei
    Zha, Zheng-Jun
    Wu, Wei
    Zheng, Kecheng
    Sun, Qibin
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4368 - 4377
  • [45] Personvit: large-scale self-supervised vision transformer for person re-identification
    Hu, Bin
    Wang, Xinggang
    Liu, Wenyu
    MACHINE VISION AND APPLICATIONS, 2025, 36 (02)
  • [46] Point-level feature learning based on vision transformer for occluded person re-identification
    Gao, Hua
    Hu, Chenchen
    Han, Guang
    Mao, Jiafa
    Huang, Wei
    Guan, Qiu
    IMAGE AND VISION COMPUTING, 2024, 143
  • [47] Vision transformer-based robust learning for cloth-changing person re-identification
    Xue, Chen
    Deng, Zhongliang
    Yang, Wangwang
    Hu, Enwen
    Zhang, Yao
    Wang, Shuo
    Wang, Yiming
    APPLIED SOFT COMPUTING, 2024, 163
  • [48] Deeply Coupled Convolution-Transformer With Spatial-Temporal Complementary Learning for Video-Based Person Re-Identification
    Liu, Xuehu
    Yu, Chenyang
    Zhang, Pingping
    Lu, Huchuan
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 13753 - 13763
  • [49] Cross-Modality Spatial-Temporal Transformer for Video-Based Visible-Infrared Person Re-Identification
    Feng, Yujian
    Chen, Feng
    Yu, Jian
    Ji, Yimu
    Wu, Fei
    Liu, Tianliang
    Liu, Shangdong
    Jing, Xiao-Yuan
    Luo, Jiebo
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6582 - 6594
  • [50] Temporal Attention Quality Aware Network for Video-based Person Re-Identification
    Xu, Boqin
    Liu, Changhong
    Xue, Shengjun
    Jiang, Aiwen
    Wang, Shimin
    Ye, Jihua
    TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069