A Local-Global Estimator Based on Large Kernel CNN and Transformer for Human Pose Estimation and Running Pose Measurement

被引：1

作者：

Wu, Qingtian ^{[1
]}

Wu, Yongfei ^{[2
]}

Zhang, Yu ^{[1
,3
]}

Zhang, Liming

机构：

[1] Univ Macau, Fac Sci & Technol, Macau, Peoples R China

[2] Taiyuan Univ Technol, Coll Data Sci, Taiyuan 030024, Peoples R China

[3] Shenyang Univ Chem Technol, Comp Sci & Technol Coll, Shenyang 110142, Peoples R China

来源：

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT | 2022年 / 71卷

关键词：

Transformers; Pose estimation; Convolutional neural networks; Feature extraction; Visualization; Task analysis; Kernel; Convolutional neural networks (CNN); human pose estimation (HPE); local(-)global estimator; running pose measurement; vision transformer (ViT);

D O I：

10.1109/TIM.2022.3200438

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Running pose in the crowd can serve as an early warning of most abnormal events (e.g., chasing, fleeing, and robbing), which can be achieved by human behavior analysis based on human pose measurement. Although deep convolutional neural networks (CNNs) have achieved impressive progress on human pose estimation (HPE), how to further improve the trade-off between estimation accuracy and speed remains an open issue. In this work, we first propose an efficient local-global estimator for HPE (called LGPose). Then based on the keypoints estimated by our LGPose, a simple regression model is defined using the geometry of the joints to achieve fast and accurate running pose measurement. To model the relationships between the human keypoints, a visual transformer (ViT) encoder is adopted to learn the long-range interdependencies between them at the pixel level. However, the operation of the transformer encoder is based on sequence processing that linearly projects the 2-D image patches to 1-D tokens. It loses the important local information. Yet, locality is crucial since it has relevance to lines, edges, and shapes. To learn the locality, we design effective CNN modules, rather than the original fully-connected network (FCN), into the feedforward module of ViT. Experiments on the MPII and COCO Keypoint val2017 datasets show that the proposed LGPose achieves the best trade-off among the compared state-of-the-art methods. Moreover, we build a lightweight running movement dataset to verify the effectiveness of our LGPose. Based on the human pose estimated by our LGPose, we propose a regression model to measure running pose with an accuracy of 86.4% without training any other classifier. Our source codes and running dataset will be made publicly available.

引用

页数：12

共 50 条

[1] LOCAL TO GLOBAL TRANSFORMER FOR VIDEO BASED 3D HUMAN POSE ESTIMATION
Ma, Haifeng
Ke Lu
Xue, Jian
Niu, Zehai
Gao, Pengcheng
2022 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (IEEE ICMEW 2022), 2022,
[2] RGBD Object Pose Recognition using Local-Global Multi-Kernel Regression
El-Gaaly, Tarek
Torki, Marwan
Elgammal, Ahmed
Singh, Maneesh
2012 21ST INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR 2012), 2012, : 2468 - 2471
[3] Learning Local-Global Contextual Adaptation for Multi-Person Pose Estimation
Xue, Nan
Wu, Tianfu
Xia, Gui-Song
Zhang, Liangpei
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13055 - 13064
[4] Human Pose Estimation using Global and Local Normalization
Sun, Ke
Lan, Cuiling
Xing, Junliang
Zeng, Wenjun
Liu, Dong
Wang, Jingdong
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5600 - 5608
[5] A combined local and global structure module for human pose estimation
Yang, Zhihui
Tang, Xiangyu
Zhang, Lijuan
Yang, Zhiling
JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2021, 21 (06) : 1913 - 1923
[6] Transformer-based rapid human pose estimation network
Wang, Dong
Xie, Wenjun
Cai, Youcheng
Li, Xinjie
Liu, Xiaoping
COMPUTERS & GRAPHICS-UK, 2023, 116 : 317 - 326
[7] CTHPose: An Efficient and Effective CNN-Transformer Hybrid Network for Human Pose Estimation
Chen, Danya
Wu, Lijun
Chen, Zhicong
Lin, Xufeng
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT V, 2024, 14429 : 327 - 339
[8] GLPose: Global-Local Representation Learning for Human Pose Estimation
Jiao, Yingying
Chen, Haipeng
Feng, Runyang
Chen, Haoming
Wu, Sifan
Yin, Yifang
Liu, Zhenguang
ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2022, 18 (02)
[9] Pose Estimation of Robot End-Effector using a CNN-Based Cascade Estimator
Ortega, Kevin D.
Sepulveda, Jorge I.
Hernandez, Byron
Holguin, German A.
Medeiros, Henry
2023 IEEE 6TH COLOMBIAN CONFERENCE ON AUTOMATIC CONTROL, CCAC, 2023, : 85 - 90
[10] Spatiotemporal Learning Transformer for Video-Based Human Pose Estimation
Gai, Di
Feng, Runyang
Min, Weidong
Yang, Xiaosong
Su, Pengxiang
Wang, Qi
Han, Qing
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (09) : 4564 - 4576

← 1 2 3 4 5 →