A Local-Global Estimator Based on Large Kernel CNN and Transformer for Human Pose Estimation and Running Pose Measurement

被引:1
|
作者
Wu, Qingtian [1 ]
Wu, Yongfei [2 ]
Zhang, Yu [1 ,3 ]
Zhang, Liming
机构
[1] Univ Macau, Fac Sci & Technol, Macau, Peoples R China
[2] Taiyuan Univ Technol, Coll Data Sci, Taiyuan 030024, Peoples R China
[3] Shenyang Univ Chem Technol, Comp Sci & Technol Coll, Shenyang 110142, Peoples R China
关键词
Transformers; Pose estimation; Convolutional neural networks; Feature extraction; Visualization; Task analysis; Kernel; Convolutional neural networks (CNN); human pose estimation (HPE); local(-)global estimator; running pose measurement; vision transformer (ViT);
D O I
10.1109/TIM.2022.3200438
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Running pose in the crowd can serve as an early warning of most abnormal events (e.g., chasing, fleeing, and robbing), which can be achieved by human behavior analysis based on human pose measurement. Although deep convolutional neural networks (CNNs) have achieved impressive progress on human pose estimation (HPE), how to further improve the trade-off between estimation accuracy and speed remains an open issue. In this work, we first propose an efficient local-global estimator for HPE (called LGPose). Then based on the keypoints estimated by our LGPose, a simple regression model is defined using the geometry of the joints to achieve fast and accurate running pose measurement. To model the relationships between the human keypoints, a visual transformer (ViT) encoder is adopted to learn the long-range interdependencies between them at the pixel level. However, the operation of the transformer encoder is based on sequence processing that linearly projects the 2-D image patches to 1-D tokens. It loses the important local information. Yet, locality is crucial since it has relevance to lines, edges, and shapes. To learn the locality, we design effective CNN modules, rather than the original fully-connected network (FCN), into the feedforward module of ViT. Experiments on the MPII and COCO Keypoint val2017 datasets show that the proposed LGPose achieves the best trade-off among the compared state-of-the-art methods. Moreover, we build a lightweight running movement dataset to verify the effectiveness of our LGPose. Based on the human pose estimated by our LGPose, we propose a regression model to measure running pose with an accuracy of 86.4% without training any other classifier. Our source codes and running dataset will be made publicly available.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] LOCAL TO GLOBAL TRANSFORMER FOR VIDEO BASED 3D HUMAN POSE ESTIMATION
    Ma, Haifeng
    Ke Lu
    Xue, Jian
    Niu, Zehai
    Gao, Pengcheng
    2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
  • [2] RGBD Object Pose Recognition using Local-Global Multi-Kernel Regression
    El-Gaaly, Tarek
    Torki, Marwan
    Elgammal, Ahmed
    Singh, Maneesh
    2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 2468 - 2471
  • [3] Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation
    Xue, Nan
    Wu, Tianfu
    Xia, Gui-Song
    Zhang, Liangpei
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13055 - 13064
  • [4] Human Pose Estimation using Global and Local Normalization
    Sun, Ke
    Lan, Cuiling
    Xing, Junliang
    Zeng, Wenjun
    Liu, Dong
    Wang, Jingdong
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5600 - 5608
  • [5] A combined local and global structure module for human pose estimation
    Yang, Zhihui
    Tang, Xiangyu
    Zhang, Lijuan
    Yang, Zhiling
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2021, 21 (06) : 1913 - 1923
  • [6] Transformer-based rapid human pose estimation network
    Wang, Dong
    Xie, Wenjun
    Cai, Youcheng
    Li, Xinjie
    Liu, Xiaoping
    COMPUTERS & GRAPHICS-UK, 2023, 116 : 317 - 326
  • [7] CTHPose: An Efficient and Effective CNN-Transformer Hybrid Network for Human Pose Estimation
    Chen, Danya
    Wu, Lijun
    Chen, Zhicong
    Lin, Xufeng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT V, 2024, 14429 : 327 - 339
  • [8] GLPose: Global-Local Representation Learning for Human Pose Estimation
    Jiao, Yingying
    Chen, Haipeng
    Feng, Runyang
    Chen, Haoming
    Wu, Sifan
    Yin, Yifang
    Liu, Zhenguang
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (02)
  • [9] Pose Estimation of Robot End-Effector using a CNN-Based Cascade Estimator
    Ortega, Kevin D.
    Sepulveda, Jorge I.
    Hernandez, Byron
    Holguin, German A.
    Medeiros, Henry
    2023 IEEE 6TH COLOMBIAN CONFERENCE ON AUTOMATIC CONTROL, CCAC, 2023, : 85 - 90
  • [10] Spatiotemporal Learning Transformer for Video-Based Human Pose Estimation
    Gai, Di
    Feng, Runyang
    Min, Weidong
    Yang, Xiaosong
    Su, Pengxiang
    Wang, Qi
    Han, Qing
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4564 - 4576