3-D PersonVLAD: Learning Deep Global Representations for Video-Based Person Reidentification

被引：79

作者：

Wu, Lin ^{[1
,2
]}

Wang, Yang ^{[1
,3
]}

Shao, Ling ^{[4
]}

Wang, Meng ^{[1
]}

机构：

[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei 230000, Anhui, Peoples R China

[2] Univ Queensland, Brisbane, Qld 4072, Australia

[3] Dalian Univ Technol, Fac Elect Engn, Dalian 116024, Peoples R China

[4] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2019年 / 30卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Spatiotemporal phenomena; Neural networks; Streaming media; Solid modeling; Learning systems; Computer science; 3-D convolution; global representations; person reidentification (re-ID); vector of local aggregated descriptors (VLAD);

D O I：

10.1109/TNNLS.2019.2891244

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present the global deep video representation learning to video-based person reidentification (re-ID) that aggregates local 3-D features across the entire video extent. Existing methods typically extract frame-wise deep features from 2-D convolutional networks (ConvNets) which are pooled temporally to produce the video-level representations. However, 2-D ConvNets lose temporal priors immediately after the convolutions, and a separate temporal pooling is limited in capturing human motion in short sequences. In this paper, we present global video representation learning, to be complementary to 3-D ConvNets as a novel layer to capture the appearance and motion dynamics in full-length videos. Nevertheless, encoding each video frame in its entirety and computing aggregate global representations across all frames is tremendously challenging due to the occlusions and misalignments. To resolve this, our proposed network is further augmented with the 3-D part alignment to learn local features through the soft-attention module. These attended features are statistically aggregated to yield identity-discriminative representations. Our global 3-D features are demonstrated to achieve the state-of-the-art results on three benchmark data sets: MARS, Imagery Library for Intelligent Detection Systems-Video Re-identification, and PRID2011.

引用

页码：3347 / 3359

页数：13

共 50 条

[1] Learning Deep Representations for Video-Based Intake Gesture Detection
Rouast, Philipp V.
Adam, Marc T. P.
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2020, 24 (06) : 1727 - 1737
[2] Spatiotemporal Interaction Transformer Network for Video-Based Person Reidentification in Internet of Things
Yang, Fan
Li, Wei
Liang, Binbin
Zhang, Jianwei
IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (14) : 12537 - 12547
[3] Learning Recurrent 3D Attention for Video-Based Person Re-Identification
Chen, Guangyi
Lu, Jiwen
Yang, Ming
Zhou, Jie
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 6963 - 6976
[4] Multiview Video-Based 3-D Hand Pose Estimation
Khaleghi L.
Sepas-Moghaddam A.
Marshall J.
Etemad A.
IEEE Transactions on Artificial Intelligence, 2023, 4 (04): : 896 - 909
[5] CARF-Net: CNN attention and RNN fusion network for video-based person reidentification
Kansal, Kajal
Venkata, Subramanyam
Prasad, Dilip K.
Kankanhalli, Mohan
JOURNAL OF ELECTRONIC IMAGING, 2019, 28 (02)
[6] Deep asymmetric video-based person re-identification
Meng, Jingke
Wu, Ancong
Zheng, Wei-Shi
PATTERN RECOGNITION, 2019, 93 : 430 - 441
[7] Deep Learning for Video-Based Assessment in Surgery
Yanik, Erim
Schwaitzberg, Steven
De, Suvranu
JAMA SURGERY, 2024, 159 (08) : 957 - 958
[8] Few-Shot Deep Adversarial Learning for Video-Based Person Re-Identification
Wu, Lin
Wang, Yang
Yin, Hongzhi
Wang, Meng
Shao, Ling
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 1233 - 1245
[9] Progressive learning in cross-modal cross-scale fusion transformer for visible-infrared video-based person reidentification
Mukhtar, Hamza
Mukhtar, Umar Raza
KNOWLEDGE-BASED SYSTEMS, 2024, 304
[10] LOMO3D DESCRIPTOR FOR VIDEO-BASED PERSON RE-IDENTIFICATION
Zheng, Sutong
Li, Xiaoyu
Jiang, Zhuqing
Guo, Xiaoqiang
2017 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2017), 2017, : 672 - 676

← 1 2 3 4 5 →