Multi-Scale Contrastive Learning for Human Pose Estimation

被引:0
|
作者
Bao, Wenxia [1 ]
Lin, An [1 ]
Huang, Hua [1 ]
Yang, Xianjun [1 ]
Chen, Hemu [1 ]
机构
[1] Anhui Univ, Sch Elect & Informat Engn, Hefei 230601, Anhui, Peoples R China
关键词
human pose estimation; contrastive learning; multi-scale fea-; ture; feature pyramid network;
D O I
10.1587/transinf.2024EDP7048
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent years have seen remarkable progress in human pose estimation. However, manual annotation of keypoints remains tedious and imprecise. To alleviate this problem, this paper proposes a novel method called Multi-Scale Contrastive Learning (MSCL). This method uses a siamese network structure with upper and lower branches that capture diffirent views of the same image. Each branch uses a backbone network to extract image representations, employing multi-scale feature vectors to capture information. These feature vectors are then passed through an enhanced feature pyramid for fusion, producing more robust feature representations. The feature vectors are then further encoded by mapping and prediction heads to predict the feature vector of another view. Using negative cosine similarity between vectors as a loss function, the backbone network is pre-trained on a large-scale unlabeled dataset, enhancing its capacity to extract visual representations. Finally, transfer learning is performed on a small amount of labelled data for the pose estimation task. Experiments on COCO datasets show significant improvements in Average Precision (AP) of 1.8%, 0.9%, and 1.2% with 1%, 5%, and 10% labelled data on COCO. In addition, the Percentage of Correct Keypoints (PCK) improves by 0.5% on MPII&AIC, outperforming mainstream contrastive learning methods.
引用
收藏
页码:1332 / 1341
页数:10
相关论文
共 50 条
  • [31] MSRT: multi-scale representation transformer for regression-based human pose estimation
    Shan, Beiguang
    Shi, Qingxuan
    Yang, Fang
    PATTERN ANALYSIS AND APPLICATIONS, 2023, 26 (02) : 591 - 603
  • [32] Multi-scale Adaptive Structure Network for Human Pose Estimation from Color Images
    Zhuang, Wenlin
    Peng, Cong
    Xia, Siyu
    Wang, Yangang
    COMPUTER VISION - ACCV 2018, PT I, 2019, 11361 : 643 - 658
  • [33] MSPENet: multi-scale adaptive fusion and position enhancement network for human pose estimation
    Xu, Jia
    Liu, Weibin
    Xing, Weiwei
    Wei, Xiang
    VISUAL COMPUTER, 2023, 39 (05): : 2005 - 2019
  • [34] Human pose estimation with gated multi-scale feature fusion and spatial mutual information
    Zhao, Xiaoming
    Guo, Chenchen
    Zou, Qiang
    VISUAL COMPUTER, 2023, 39 (01): : 119 - 137
  • [35] Human pose estimation with gated multi-scale feature fusion and spatial mutual information
    Xiaoming Zhao
    Chenchen Guo
    Qiang Zou
    The Visual Computer, 2023, 39 : 119 - 137
  • [36] MSRT: multi-scale representation transformer for regression-based human pose estimation
    Beiguang Shan
    Qingxuan Shi
    Fang Yang
    Pattern Analysis and Applications, 2023, 26 : 591 - 603
  • [37] MSPENet: multi-scale adaptive fusion and position enhancement network for human pose estimation
    Jia Xu
    Weibin Liu
    Weiwei Xing
    Xiang Wei
    The Visual Computer, 2023, 39 : 2005 - 2019
  • [38] A lightweight pose estimation network with multi-scale receptive field
    Li, Shuo
    Dai, Ju
    Chen, Zhangmeng
    Pan, Junjun
    VISUAL COMPUTER, 2023, 39 (08): : 3429 - 3440
  • [39] Head Pose Estimation Using Multi-scale Gaussian Derivatives
    Jain, Varun
    Crowley, James L.
    IMAGE ANALYSIS, SCIA 2013: 18TH SCANDINAVIAN CONFERENCE, 2013, 7944 : 319 - 328
  • [40] A lightweight pose estimation network with multi-scale receptive field
    Shuo Li
    Ju Dai
    Zhangmeng Chen
    Junjun Pan
    The Visual Computer, 2023, 39 : 3429 - 3440