Multi-Scale Contrastive Learning for Human Pose Estimation

被引:0
|
作者
Bao, Wenxia [1 ]
Lin, An [1 ]
Huang, Hua [1 ]
Yang, Xianjun [1 ]
Chen, Hemu [1 ]
机构
[1] Anhui Univ, Sch Elect & Informat Engn, Hefei 230601, Anhui, Peoples R China
关键词
human pose estimation; contrastive learning; multi-scale fea-; ture; feature pyramid network;
D O I
10.1587/transinf.2024EDP7048
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent years have seen remarkable progress in human pose estimation. However, manual annotation of keypoints remains tedious and imprecise. To alleviate this problem, this paper proposes a novel method called Multi-Scale Contrastive Learning (MSCL). This method uses a siamese network structure with upper and lower branches that capture diffirent views of the same image. Each branch uses a backbone network to extract image representations, employing multi-scale feature vectors to capture information. These feature vectors are then passed through an enhanced feature pyramid for fusion, producing more robust feature representations. The feature vectors are then further encoded by mapping and prediction heads to predict the feature vector of another view. Using negative cosine similarity between vectors as a loss function, the backbone network is pre-trained on a large-scale unlabeled dataset, enhancing its capacity to extract visual representations. Finally, transfer learning is performed on a small amount of labelled data for the pose estimation task. Experiments on COCO datasets show significant improvements in Average Precision (AP) of 1.8%, 0.9%, and 1.2% with 1%, 5%, and 10% labelled data on COCO. In addition, the Percentage of Correct Keypoints (PCK) improves by 0.5% on MPII&AIC, outperforming mainstream contrastive learning methods.
引用
收藏
页码:1332 / 1341
页数:10
相关论文
共 50 条
  • [21] Enhancing multi-scale information exchange and feature fusion for human pose estimation
    Wang, Rui
    Wu, Wanyu
    Wang, Xiangyang
    VISUAL COMPUTER, 2023, 39 (10): : 4751 - 4765
  • [22] MTPose: Human Pose Estimation with High-Resolution Multi-scale Transformers
    Rui Wang
    Fudi Geng
    Xiangyang Wang
    Neural Processing Letters, 2022, 54 : 3941 - 3964
  • [23] Human pose estimation based on feature enhancement and multi-scale feature fusion
    Dandan Cao
    Weibin Liu
    Weiwei Xing
    Xiang Wei
    Signal, Image and Video Processing, 2023, 17 : 643 - 650
  • [24] Human Pose Estimation Method Based on Optimized Multi-scale Feature Fusion
    Liu, Hongzhe
    Tao, Xiangru
    Xu, Cheng
    Cao, Dongpu
    Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2024, 60 (16): : 306 - 313
  • [25] VehiPose: Multi-Scale Framework for Vehicle Pose Estimation
    Gupta, Divyansh
    Artacho, Bruno
    Savakis, Andreas
    APPLICATIONS OF DIGITAL IMAGE PROCESSING XLIV, 2021, 11842
  • [26] Face Pose Estimation with Ensemble Multi-scale Representations
    Han, Zhaocui
    Song, Weiwei
    Yang, Xue
    Ou, Zongying
    2019 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND PATTERN RECOGNITION (AIPR 2019), 2019, : 97 - 101
  • [27] Multi-scale Contrastive Learning for Complex Scene Generation
    Lee, Hanbit
    Kim, Youna
    Lee, Sang-goo
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 764 - 774
  • [28] Multi-scale Subgraph Contrastive Learning for Link Prediction
    Sun, Shilin
    Zhang, Zehua
    Wang, Runze
    Tian, Hua
    ROUGH SETS, IJCRS 2022, 2022, 13633 : 217 - 223
  • [29] Joint multi-scale transformers and pose equivalence constraints for 3D human pose estimation
    Wu, Yongpeng
    Kong, Dehui
    Gao, Junna
    Li, Jinghua
    Yin, Baocai
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 103
  • [30] Multi-scale and Cross-scale Contrastive Learning for Semantic Segmentation
    Pissas, Theodoros
    Ravasio, Claudio S.
    Da Cruz, Lyndon
    Bergeles, Christos
    COMPUTER VISION, ECCV 2022, PT XXIX, 2022, 13689 : 413 - 429