Multitask Multigranularity Aggregation With Global-Guided Attention for Video Person Re-Identification

被引:6
|
作者
Sun, Dengdi [1 ]
Huang, Jiale [2 ]
Hu, Lei [2 ]
Tang, Jin [3 ]
Ding, Zhuanlian [4 ]
机构
[1] Anhui Univ, Sch Artificial Intelligence, Key Lab Intelligent Comp & Signal Proc ICSP, Minist Educ, Hefei 230601, Peoples R China
[2] Anhui Univ, Wendian Coll, Hefei 230601, Peoples R China
[3] Anhui Univ, Sch Comp Sci & Technol, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China
[4] Anhui Univ, Sch Internet, Hefei 230039, Peoples R China
关键词
Feature extraction; Multitasking; Video sequences; Task analysis; Data mining; Semantics; Convolutional neural networks; Person re-identification; video; multi-task; multi-granularity; attention mechanism; global feature; SET;
D O I
10.1109/TCSVT.2022.3183011
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The goal of video-based person re-identification (Re-ID) is to identify the same person across multiple non-overlapping cameras. The key to accomplishing this challenging task is to sufficiently exploit both spatial and temporal cues in video sequences. However, most current methods are incapable of accurately locating semantic regions or efficiently filtering discriminative spatio-temporal features; so it is difficult to handle issues such as spatial misalignment and occlusion. Thus, we propose a novel feature aggregation framework, multi-task and multi-granularity aggregation with global-guided attention (MMA-GGA), which aims to adaptively generate more representative spatio-temporal aggregation features. Specifically, we develop a multi-task multi-granularity aggregation (MMA) module to extract features at different locations and scales to identify key semantic-aware regions that are robust to spatial misalignment. Then, to determine the importance of the multi-granular semantic information, we propose a global-guided attention (GGA) mechanism to learn weights based on the global features of the video sequence, allowing our framework to identify stable local features while ignoring occlusions. Therefore, the MMA-GGA framework can efficiently and effectively capture more robust and representative features. Extensive experiments on four benchmark datasets demonstrate that our MMA-GGA framework outperforms current state-of-the-art methods. In particular, our method achieves a rank-1 accuracy of 91.0% on the MARS dataset, the most widely used database, significantly outperforming existing methods.
引用
收藏
页码:7758 / 7771
页数:14
相关论文
共 50 条
  • [31] Video-based Person Re-identification Using Refined Attention Networks
    Rahman, Tanzila
    Rochan, Mrigank
    Wang, Yang
    2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
  • [32] Pose matters: Pose guided graph attention network for person re-identification
    Zhijun HE
    Hongbo ZHAO
    Jianrong WANG
    Wenquan FENG
    Chinese Journal of Aeronautics, 2023, 36 (05) : 447 - 464
  • [33] Pose matters: Pose guided graph attention network for person re-identification
    He, Zhijun
    Zhao, Hongbo
    Wang, Jianrong
    Feng, Wenquan
    CHINESE JOURNAL OF AERONAUTICS, 2023, 36 (05) : 447 - 464
  • [34] Attention guided by human keypoint for infrared-visible person re-identification
    Yu, Peng
    Tian, Xiao-jian
    Qi, Nan
    Piao, Yan
    JOURNAL OF INFRARED AND MILLIMETER WAVES, 2024, 43 (06) : 871 - 878
  • [35] CONVOLUTIONAL TEMPORAL ATTENTION MODEL FOR VIDEO-BASED PERSON RE-IDENTIFICATION
    Rahman, Tanzila
    Rochan, Mrigank
    Wang, Yang
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1102 - 1107
  • [36] Diversity Regularized Spatiotemporal Attention for Video-based Person Re-identification
    Li, Shuang
    Bak, Slawomir
    Carr, Peter
    Wang, Xiaogang
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 369 - 378
  • [37] SCAN: Self-and-Collaborative Attention Network for Video Person Re-Identification
    Zhang, Ruimao
    Li, Jingyu
    Sun, Hongbin
    Ge, Yuying
    Luo, Ping
    Wang, Xiaogang
    Lin, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (10) : 4870 - 4882
  • [38] SANet: Statistic Attention Network for Video-Based Person Re-Identification
    Bai, Shutao
    Ma, Bingpeng
    Chang, Hong
    Huang, Rui
    Shan, Shiguang
    Chen, Xilin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) : 3866 - 3879
  • [39] Spatiotemporal Attention on Sliced Parts for Video-based Person Re-identification
    Yang, Xu
    Zhang, Bin
    Dong, Yuan
    Xiong, Fengye
    Bai, Hongliang
    2018 IEEE INTERNATIONAL CONFERENCE ON VISUAL COMMUNICATIONS AND IMAGE PROCESSING (IEEE VCIP), 2018,
  • [40] Context Sensing Attention Network for Video-based Person Re-identification
    Wang, Kan
    Ding, Changxing
    Pang, Jianxin
    Xu, Xiangmin
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (04)