Detecting and grouping keypoints for multi-person pose estimation using instance-aware attention

被引:16
|
作者
Yang, Sen [1 ,2 ]
Feng, Ze [1 ,2 ]
Wang, Zhicheng [3 ]
Li, Yanjie [4 ]
Zhang, Shoukui [5 ]
Quan, Zhibin [1 ,2 ]
Xia, Shu-tao [4 ]
Yang, Wankou [1 ,2 ]
机构
[1] Southeast Univ, Sch Automat, Nanjing 210096, Peoples R China
[2] Southeast Univ, Key Lab Measurement & Control Complex Syst Engn, Minist Educ, Nanjing 210096, Peoples R China
[3] Nreal, Beijing, Peoples R China
[4] Tsinghua Univ, Tsinghua Shenzhen Int Grad Sch, Beijing, Peoples R China
[5] Meituan, Beijing, Peoples R China
关键词
Multi-person human pose estimation; Self-attention; Bottom-up; Transformer; Grouping; Keypoints association;
D O I
10.1016/j.patcog.2022.109232
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bottom-up human pose estimation models detect keypoints and learn associative information between keypoints, usually requiring human predefined offset fields or embeddings for keypoints grouping (clus-tering). In this paper, we present a brand new method that can entirely solve these problems based on Transformer, making the grouping process free of the human-defined associative signals. Specifically, the self-attention in vision Transformer measures feature similarity between any pair of locations, which pro-vides a metric space to associate keypoints together into corresponding human instances. However, the naive attention patterns formed in Transformer are still not subjectively controlled, so there is no guar-antee that the keypoints only attend to the instances to which they belong. To address it we propose a novel approach of supervising self-attention to be instance-aware, simultaneously accomplishing multi -person keypoint detection and clustering. By doing so, we can group the detected keypoints to their corresponding instances, according to the pairwise attention scores.An additional benefit of our method is that the instance segmentation results of any number of people can be directly obtained from the supervised attention matrix, thereby simplifying the pixel assignment pipeline. The qualitative and quantitative results on the COCO shows that, with a very simple architecture design, our method can achieve comparable performance against the CNN-based bottom-up counterparts with fewer parameters, which also demonstrate a promising way to control self-attention mechanism behavior for specific purposes.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] FCPose: Fully Convolutional Multi-Person Pose Estimation with Dynamic Instance-Aware Convolutions
    Mao, Weian
    Tian, Zhi
    Wang, Xinlong
    Shen, Chunhua
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 9030 - 9039
  • [2] InsPose: Instance-Aware Networks for Single-Stage Multi-Person Pose Estimation
    Shi, Dahu
    Wei, Xing
    Yu, Xiaodong
    Tan, Wenming
    Ren, Ye
    Pu, Shiliang
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 3079 - 3087
  • [3] Instance-aware representation learning and association for online multi-person tracking
    Wu, Hefeng
    Hu, Yafei
    Wang, Keze
    Li, Hanhui
    Nie, Lin
    Cheng, Hui
    PATTERN RECOGNITION, 2019, 94 : 25 - 34
  • [4] The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation
    Braso, Guillem
    Kister, Nikita
    Leal-Taixe, Laura
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 11833 - 11843
  • [5] Multi-person pose estimation based on graph grouping optimization
    Zeng, Qingzhi
    Hu, Yingsong
    Li, Dan
    Sun, Dongya
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (05) : 7039 - 7053
  • [6] Contextual Instance Decoupling for Robust Multi-Person Pose Estimation
    Wang, Dongkai
    Zhang, Shiliang
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 11050 - 11058
  • [7] Multi-person pose estimation based on graph grouping optimization
    Qingzhi Zeng
    Yingsong Hu
    Dan Li
    Dongya Sun
    Multimedia Tools and Applications, 2023, 82 : 7039 - 7053
  • [8] MultiPoseSeg: Feedback Knowledge Transfer for Multi-Person Pose Estimation and Instance Segmentation
    Ahmad, Niaz
    Khan, Jawad
    Kim, Jeremy Yuhyun
    Lee, Youngmoon
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2086 - 2092
  • [9] Multi-Person Pose Estimation using an Orientation and Occlusion Aware Deep Learning Network
    Gu, Yanlei
    Zhang, Huiyang
    Kamijo, Shunsuke
    SENSORS, 2020, 20 (06)
  • [10] Scale-aware attention-based multi-resolution representation for multi-person pose estimation
    Yang, Honghong
    Guo, Longfei
    Wu, Xiaojun
    Zhang, Yumei
    MULTIMEDIA SYSTEMS, 2022, 28 (01) : 57 - 67