Graph based Spatial-temporal Fusion for Multi-modal Person Re-identification

被引:0
|
作者
Zhang, Yaobin [1 ]
Lv, Jianming [1 ]
Liu, Chen [2 ]
Cai, Hongmin [1 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
[2] Chinese Acad Sci, Inst Comp Technol, Beijing, Peoples R China
关键词
Unsupervised Person re-ID; Spatio-temporal; Graph; Re-ranking; ADAPTATION;
D O I
10.1145/3581783.3613757
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As a challenging task, unsupervised person re-identification (Re-ID) aims to optimize the pedestrian matching model based on the unlabeled image frames from surveillance videos. Recently, the fusion with the spatio-temporal clues of pedestrians have been proven effective to improve the performance of classification. However, most of these methods adopt some hard combination approaches by multiplying the visual scores with the spatio-temporal scores, which are sensitive to the noise caused by imprecise estimation of the spatio-temporal patterns in unlabeled datasets and limit the advantage of the fusion model. In this paper, we propose a Graph based Spatio-Temporal Fusion model for high-performance multi-modal person Re-ID, namely G-Fusion, to mitigate the impact of noise. In particular, we construct a graph of pedestrian images by selecting neighboring nodes based on the visual information and the transition time between cameras. Then we use a randomly initialized two-layer GraphSAGE model to obtain the multi-modal affinity matrix between images, and deploy the distillation learning to optimize the visual model by learning the affinity between the nodes. Finally, a graph-based multi-modal re-ranking method is deployed to make the decision in the testing phase for precise person Re-ID. Comprehensive experiments are conducted on two large-scale Re-ID datasets, and the results show that our method achieves a significant improvement of the performance while combined with SOTA unsupervised person Re-ID methods. Specifically, the mAP scores can reach 92.2%, and 80.4% on the Market-1501, and MSMT17 datasets respectively.
引用
收藏
页码:3736 / 3744
页数:9
相关论文
共 50 条
  • [21] COMPLEX SPATIAL-TEMPORAL ATTENTION AGGREGATION FOR VIDEO PERSON RE-IDENTIFICATION
    Ding, Wenjie
    Wei, Xing
    Hong, Xiaopeng
    Gong, Yihong
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 2441 - 2445
  • [22] Pose-aware Person Re-Identification with Spatial-temporal Attention
    Zhu, Qi
    2019 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE APPLICATIONS AND TECHNOLOGIES (AIAAT 2019), 2019, 646
  • [23] Spatial-Temporal Correlation and Topology Learning for Person Re-Identification in Videos
    Liu, Jiawei
    Zha, Zheng-Jun
    Wu, Wei
    Zheng, Kecheng
    Sun, Qibin
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 4368 - 4377
  • [24] Progressive spatial-temporal transfer model for unsupervised person re-identification
    Zhou, Shuren
    Li, Zhixiong
    Liu, Jie
    Zhou, Jiarui
    Zhang, Jianming
    INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL, 2024, 13 (02)
  • [25] Video-based person re-identification with parallel spatial-temporal attention module
    Kong, Jun
    Teng, Zhende
    Jiang, Min
    Huo, Hongtao
    JOURNAL OF ELECTRONIC IMAGING, 2020, 29 (01)
  • [26] Multi-Modal Pedestrian Trajectory Prediction for Edge Agents Based on Spatial-Temporal Graph
    Zou, Xiangyu
    Sun, Bin
    Zhao, Duan
    Zhu, Zongwei
    Zhao, Jinjin
    He, Yongxin
    IEEE ACCESS, 2020, 8 : 83321 - 83332
  • [27] Person re-identification by graph-based metric fusion
    Xie, Yi
    Levine, Martin D.
    Yu, Huimin
    ELECTRONICS LETTERS, 2016, 52 (17) : 1447 - 1448
  • [28] Multi-shot person re-identification based on appearance and spatial-temporal cues in a large camera network
    Mayssa Frikha
    Emna Fendri
    Mohamed Hammami
    Machine Vision and Applications, 2021, 32
  • [29] Multi-shot person re-identification based on appearance and spatial-temporal cues in a large camera network
    Frikha, Mayssa
    Fendri, Emna
    Hammami, Mohamed
    MACHINE VISION AND APPLICATIONS, 2021, 32 (04)
  • [30] Multi-graph feature level fusion for person re-identification
    An, Le
    Chen, Xiaojing
    Yang, Songfan
    NEUROCOMPUTING, 2017, 259 : 39 - 45