Real-Time Dynamic Gesture Recognition Algorithm Based on Adaptive Information Fusion and Multi-Scale Optimization Transformer

被引:1
|
作者
Lu, Guangda [1 ,2 ]
Sun, Wenhao [1 ,2 ]
Qin, Zhuanping [1 ,2 ]
Guo, Tinghang [1 ,2 ]
机构
[1] Tianjin Univ Technol & Educ, Sch Automat & Elect Engn, 1310 Dagu South Rd, Tianjin 300222, Peoples R China
[2] Tianjin Key Lab Informat Sensing & Intelligent Co, 1310 DaGu South Rd, Tianjin 300222, Peoples R China
关键词
dynamic gesture recognition; Transformer; optical flow; information fusion;
D O I
10.20965/jaciii.2023.p1096
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Gesture recognition is a popular technology in the field of computer vision and an important technical mean of achieving human-computer interaction. To address problems such as the limited long-range feature extraction capability of existing dynamic gesture recognition networks based on convolutional operators, we propose a dynamic gesture recognition algorithm based on spatial pyramid pooling Transformer and optical flow information fusion. We take advantage of Transformer's large receptive field to reduce model computation while improving the model's ability to extract features at different scales by embedding spatial pyramid pooling. We use the optical flow algorithm with the global motion aggregation module to obtain an optical flow map of hand motion, and to extract the key frames based on the similarity minimization principle. We also design an adaptive feature fusion method to fuse the spatial and temporal features of the dual channels. Finally, we demonstrate the effectiveness of model components on model recognition enhancement through ablation experiments. We conduct training and validation on the SCUT-DHGA dynamic gesture dataset and on a dataset we collected, and we perform real-time dynamic gesture recognition tests using the trained model. The results show that our algorithm achieves high accuracy even while keeping the parameters balanced. It also achieves fast and accurate recognition of dynamic gestures in real-time tests.
引用
收藏
页码:1096 / 1107
页数:12
相关论文
共 50 条
  • [11] EMSFomer: Efficient Multi-Scale Transformer for Real-Time Semantic Segmentation
    Xia, Zhengyu
    Kim, Joohee
    IEEE ACCESS, 2025, 13 : 18239 - 18252
  • [12] Real-time Hand Gesture Recognition Based on A Fusion Learning Method
    Wang, Weihang
    Ying, Rendong
    Qian, Jiuchao
    Ge, Hao
    Wang, Jun
    Liu, Peilin
    PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), 2017, : 535 - 540
  • [13] Real-time topology optimization based on multi-scale convolutional attention mechanism
    Zhang, Wei
    Su, Lijie
    Wang, Xianpeng
    ENGINEERING OPTIMIZATION, 2024,
  • [14] Real-time pixel-wise grasp affordance prediction based on multi-scale context information fusion
    Wu, Yongxiang
    Fu, Yili
    Wang, Shuguo
    INDUSTRIAL ROBOT-THE INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH AND APPLICATION, 2022, 49 (02): : 368 - 381
  • [15] Mixed-Type Wafer Defect Recognition With Multi-Scale Information Fusion Transformer
    Wei, Yuxiang
    Wang, Huan
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2022, 35 (02) : 341 - 352
  • [16] A multi-scale descriptor for real time RGB-D hand gesture recognition
    Huang, Yao
    Yang, Jianyu
    PATTERN RECOGNITION LETTERS, 2021, 144 : 97 - 104
  • [17] Real-Time Vehicle Object Detection Method Based on Multi-Scale Feature Fusion
    Guo, Keyou
    Li, Xue
    Zhang, Mo
    Bao, Qichao
    Yang, Min
    IEEE Access, 2021, 9 : 115126 - 115134
  • [18] Real-Time Vehicle Object Detection Method Based on Multi-Scale Feature Fusion
    Guo, Keyou
    Li, Xue
    Zhang, Mo
    Bao, Qichao
    Yang, Min
    IEEE ACCESS, 2021, 9 : 115126 - 115134
  • [19] Real-Time Dynamic Gesture Recognition Method Based on Gaze Guidance
    Zhang, Binbin
    Li, Weiqing
    Su, Zhiyong
    IEEE ACCESS, 2024, 12 : 161084 - 161095
  • [20] Real-time multi-trajectory matching for dynamic hand gesture recognition
    Jian, Chengfeng
    Li, Junjie
    IET IMAGE PROCESSING, 2020, 14 (02) : 236 - 244