Dynamic Gesture Data Optimization and Recognition Based on Encoded Video

被引:0
|
作者
Xie X. [1 ]
Cao P. [1 ]
Xia H. [1 ]
Chen Y. [1 ]
机构
[1] School of Computer, Xi’an University of Posts and Telecommunications, Xi’an
关键词
data optimization; dynamic gesture recognition; encoded video; Motion Vector; residual;
D O I
10.13190/j.jbupt.2023-072
中图分类号
学科分类号
摘要
The syntax elements such as motion vectors (MVs) and residuals in encoding video data streams can substitute for optical flow in motion representation. But its inherent pixel noise and feature sparsity may also lead to some errors when fine movements are recognized. Hence, a dynamic gesture recognition framework is designed to get higher-precision and lower-complexity by using the data optimization of syntax elements in coding video. First, a key P-frame selection strategy is introduced to cope with the feature sparsity by selecting encoding frames which cover higher information content. Second, a joint residual feature representation method is proposed to remove the noisy MV not associated with the hand by using finer gesture contour maps obtained from residuals. Finally, a lightweight and efficient dynamic gesture recognition model is designed, leveraging optimized MVs and residuals to achieve a computation effect similar to optical flow. The proposed method is validated on datasets such as Viva dataset, sheffield klnect gesture (SKIG) dataset, NvGesture dataset, and EgoGesture dataset. The results of the experiments show that while using only RGB data, the recognition accuracy of the method mentioned was 82. 94%, 99. 72%, 81. 12% and 90. 48% respectively, reducing storage overhead by 89% and achieving results comparable to the advanced methods with a running speed 4. 7 times faster. © 2024 Beijing University of Posts and Telecommunications. All rights reserved.
引用
收藏
页码:90 / 96
页数:6
相关论文
共 15 条
  • [1] LI D X, CHEN Y M, GAO M K, Et al., Multimodal gesture recognition using densely connected convolution and BLSTM, 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3365-3370, (2018)
  • [2] ABAVISANI M, JOZE H R V, PATEL V M., Improving the performance of unimodal dynamic hand-gesture recognition with multimodal training, 2019 IEEE/ CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1165-1174, (2020)
  • [3] WU C Y, ZAHEER M, HU H X, Et al., Compressed video action recognition, 2018 IEEE / CVF Conference on Computer Vision and Pattern Recognition, pp. 6026-6035, (2018)
  • [4] XIE X Y, ZHAO H, JIANG L., Dynamic gesture recognition based on characteristics of encoded video data, Journal of Beijing University of Posts and Telecommunications, 43, 5, pp. 91-97, (2020)
  • [5] WANG L M, XIONG Y J, WANG Z, Et al., Temporal segment networks: towards good practices for deep action recognition, European Conference on Computer Vision, pp. 20-36, (2016)
  • [6] LIN J, GAN C, HAN S., TSM: temporal shift module for efficient video understanding, 2019 IEEE / CVF International Conference on Computer Vision (ICCV), pp. 7082-7092, (2020)
  • [7] WANG Q L, WU B G, ZHU P F, Et al., ECA-net: efficient channel attention for deep convolutional neural networks, 2020 IEEE/ CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11531-11539, (2020)
  • [8] CHO K, VAN MERRIENBOER B, GULCEHRE C, Et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724-1734, (2014)
  • [9] OHN-BAR E, TRIVEDI M M., Hand gesture recognition in real time for automotive interfaces: a multimodal vision-based approach and evaluations, IEEE Transactions on Intelligent Transportation Systems, 15, 6, pp. 2368-2377, (2014)
  • [10] TRAN D, BOURDEV L, FERGUS R, Et al., Learning spa-tiotemporal features with 3D convolutional networks, 2015 IEEE International Conference on Computer Vision (ICCV), pp. 4489-4497, (2016)