Temporal Correlation Vision Transformer for Video Person Re-Identification

被引：0

作者：

Wu, Pengfei ^{[1
,2
]}

Wang, Le ^{[1
,2
]}

Zhou, Sanping ^{[1
,2
]}

Hua, Gang ^{[4
]}

Sun, Changyin ^{[3
]}

机构：

[1] Xi An Jiao Tong Univ, Natl Engn Res Ctr Visual Informat & Applicat, Natl Key Lab Human Machine Hybrid Augmented Intel, Xian, Peoples R China

[2] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian, Peoples R China

[3] Anhui Univ, Sch Artificial Intelligence, Hefei, Peoples R China

[4] Wormpex AI Res, Bellevue, WA USA

来源：

THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 6 | 2024年

基金：

国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video Person Re-Identification (Re-ID) is a task of retrieving persons from multi-camera surveillance systems. Despite the progress made in leveraging spatio-temporal information in videos, occlusion in dense crowds still hinders further progress. To address this issue, we propose a Temporal Correlation Vision Transformer (TCViT) for video person Re-ID. TCViT consists of a Temporal Correlation Attention (TCA) module and a Learnable Temporal Aggregation (LTA) module. The TCA module is designed to reduce the impact of non-target persons by relative state, while the LTA module is used to aggregate frame-level features based on their completeness. Specifically, TCA is a parameter-free module that first aligns frame-level features to restore semantic coherence in videos and then enhances the features of the target person according to temporal correlation. Additionally, unlike previous methods that treat each frame equally with a pooling layer, LTA introduces a lightweight learnable module to weigh and aggregate frame-level features under the guidance of a classification score. Extensive experiments on four prevalent benchmarks demonstrate that our method achieves state-of-the-art performance in video Re-ID.

引用

页码：6083 / 6091

页数：9

共 50 条

[41] A Patch Information Supplement Transformer for Person Re-Identification
Zhu, Li
Jiang, Chenglong
Wu, Minghu
ELECTRONICS, 2023, 12 (09)
[42] NFormer: Robust Person Re-identification with Neighbor Transformer
Wang, Haochen
Shen, Jiayi
Liu, Yongtuo
Gao, Yan
Gavves, Efstratios
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 7287 - 7297
[43] Denseformer: A dense transformer framework for person re-identification
Ma, Haoyan
Li, Xiang
Yuan, Xia
Zhao, Chunxia
IET COMPUTER VISION, 2023, 17 (05) : 527 - 536
[44] Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos
Liu, Jiawei
Zha, Zheng-Jun
Wu, Wei
Zheng, Kecheng
Sun, Qibin
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4368 - 4377
[45] Personvit: large-scale self-supervised vision transformer for person re-identification
Hu, Bin
Wang, Xinggang
Liu, Wenyu
MACHINE VISION AND APPLICATIONS, 2025, 36 (02)
[46] Point-level feature learning based on vision transformer for occluded person re-identification
Gao, Hua
Hu, Chenchen
Han, Guang
Mao, Jiafa
Huang, Wei
Guan, Qiu
IMAGE AND VISION COMPUTING, 2024, 143
[47] Vision transformer-based robust learning for cloth-changing person re-identification
Xue, Chen
Deng, Zhongliang
Yang, Wangwang
Hu, Enwen
Zhang, Yao
Wang, Shuo
Wang, Yiming
APPLIED SOFT COMPUTING, 2024, 163
[48] Deeply Coupled Convolution-Transformer With Spatial-Temporal Complementary Learning for Video-Based Person Re-Identification
Liu, Xuehu
Yu, Chenyang
Zhang, Pingping
Lu, Huchuan
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 13753 - 13763
[49] Cross-Modality Spatial-Temporal Transformer for Video-Based Visible-Infrared Person Re-Identification
Feng, Yujian
Chen, Feng
Yu, Jian
Ji, Yimu
Wu, Fei
Liu, Tianliang
Liu, Shangdong
Jing, Xiao-Yuan
Luo, Jiebo
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 6582 - 6594
[50] Temporal Attention Quality Aware Network for Video-based Person Re-Identification
Xu, Boqin
Liu, Changhong
Xue, Shengjun
Jiang, Aiwen
Wang, Shimin
Ye, Jihua
TENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2018), 2019, 11069

← 1 2 3 4 5 →