Multitask Multigranularity Aggregation With Global-Guided Attention for Video Person Re-Identification

被引:6
|
作者
Sun, Dengdi [1 ]
Huang, Jiale [2 ]
Hu, Lei [2 ]
Tang, Jin [3 ]
Ding, Zhuanlian [4 ]
机构
[1] Anhui Univ, Sch Artificial Intelligence, Key Lab Intelligent Comp & Signal Proc ICSP, Minist Educ, Hefei 230601, Peoples R China
[2] Anhui Univ, Wendian Coll, Hefei 230601, Peoples R China
[3] Anhui Univ, Sch Comp Sci & Technol, Anhui Prov Key Lab Multimodal Cognit Computat, Hefei 230601, Peoples R China
[4] Anhui Univ, Sch Internet, Hefei 230039, Peoples R China
关键词
Feature extraction; Multitasking; Video sequences; Task analysis; Data mining; Semantics; Convolutional neural networks; Person re-identification; video; multi-task; multi-granularity; attention mechanism; global feature; SET;
D O I
10.1109/TCSVT.2022.3183011
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The goal of video-based person re-identification (Re-ID) is to identify the same person across multiple non-overlapping cameras. The key to accomplishing this challenging task is to sufficiently exploit both spatial and temporal cues in video sequences. However, most current methods are incapable of accurately locating semantic regions or efficiently filtering discriminative spatio-temporal features; so it is difficult to handle issues such as spatial misalignment and occlusion. Thus, we propose a novel feature aggregation framework, multi-task and multi-granularity aggregation with global-guided attention (MMA-GGA), which aims to adaptively generate more representative spatio-temporal aggregation features. Specifically, we develop a multi-task multi-granularity aggregation (MMA) module to extract features at different locations and scales to identify key semantic-aware regions that are robust to spatial misalignment. Then, to determine the importance of the multi-granular semantic information, we propose a global-guided attention (GGA) mechanism to learn weights based on the global features of the video sequence, allowing our framework to identify stable local features while ignoring occlusions. Therefore, the MMA-GGA framework can efficiently and effectively capture more robust and representative features. Extensive experiments on four benchmark datasets demonstrate that our MMA-GGA framework outperforms current state-of-the-art methods. In particular, our method achieves a rank-1 accuracy of 91.0% on the MARS dataset, the most widely used database, significantly outperforming existing methods.
引用
收藏
页码:7758 / 7771
页数:14
相关论文
共 50 条
  • [1] Watching You: Global-guided Reciprocal Learning for Video-based Person Re-identification
    Liu, Xuehu
    Zhang, Pingping
    Yu, Chenyang
    Lu, Huchuan
    Yang, Xiaoyun
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 13329 - 13338
  • [2] COMPLEX SPATIAL-TEMPORAL ATTENTION AGGREGATION FOR VIDEO PERSON RE-IDENTIFICATION
    Ding, Wenjie
    Wei, Xing
    Hong, Xiaopeng
    Gong, Yihong
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2441 - 2445
  • [3] Flow guided mutual attention for person re-identification
    Kiran, Madhu
    Bhuiyan, Amran
    Nguyen-Meidine, Le Thanh
    Blais-Morin, Louis-Antoine
    Ben Ayed, Ismail
    Granger, Eric
    IMAGE AND VISION COMPUTING, 2021, 113
  • [4] Video person re-identification with global statistic pooling and self-attention distillation
    Lin, Gaojie
    Zhao, Sanyuan
    Shen, Jianbing
    NEUROCOMPUTING, 2021, 453 (453) : 777 - 789
  • [5] Spatial-temporal graph-guided global attention network for video-based person re-identification
    Xiaobao Li
    Wen Wang
    Qingyong Li
    Jiang Zhang
    Machine Vision and Applications, 2024, 35
  • [6] Spatial-temporal graph-guided global attention network for video-based person re-identification
    Li, Xiaobao
    Wang, Wen
    Li, Qingyong
    Zhang, Jiang
    MACHINE VISION AND APPLICATIONS, 2024, 35 (01)
  • [7] INTRA-CLIP AGGREGATION FOR VIDEO PERSON RE-IDENTIFICATION
    Isobe, Takashi
    Han, Jian
    Zhu, Fang
    Li, Yali
    Wang, Shengjin
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2336 - 2340
  • [8] Temporal Aggregation with Clip-level Attention for Video-based Person Re-identification
    Li, Mengliu
    Xu, Han
    Wang, Jinjun
    Li, Wenpeng
    Sun, Yongli
    2020 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2020, : 3365 - 3373
  • [9] Video-Based Convolutional Attention for Person Re-Identification
    Zamprogno, Marco
    Passon, Marco
    Martinel, Niki
    Serra, Giuseppe
    Lancioni, Giuseppe
    Micheloni, Christian
    Tasso, Carlo
    Foresti, Gian Luca
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2019, PT I, 2019, 11751 : 3 - 14
  • [10] Relation-Aware Global Attention for Person Re-identification
    Zhang, Zhizheng
    Lan, Cuiling
    Zeng, Wenjun
    Jin, Xin
    Chen, Zhibo
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3183 - 3192