Multi-Frequency RF Sensor Fusion for Word-Level Fluent ASL Recognition

被引:9
|
作者
Gurbuz, Sevgi Z. [1 ]
Rahman, M. Mahbubur [1 ]
Kurtoglu, Emre [1 ]
Malaia, Evie [2 ]
Gurbuz, Ali Cafer [3 ]
Griffin, Darrin J. [4 ]
Crawford, Chris [5 ]
机构
[1] Univ Alabama, Dept Elect & Comp Engn, Tuscaloosa, AL 35487 USA
[2] Univ Alabama, Dept Commun Disorders, Tuscaloosa, AL 35487 USA
[3] Mississippi State Univ, Dept Elect & Comp Engn, Starkville, MS 39762 USA
[4] Univ Alabama, Dept Commun Studies, Tuscaloosa, AL 35487 USA
[5] Univ Alabama, Dept Comp Sci, Tuscaloosa, AL 35487 USA
基金
美国国家科学基金会;
关键词
Sensors; Radio frequency; Radar; Bandwidth; Auditory system; Time-frequency analysis; Sensor fusion; American sign language; gesture recognition; radar micro-Doppler; RF sensing; deep learning; autoencoders; DOPPLER RADAR; CLASSIFICATION;
D O I
10.1109/JSEN.2021.3078339
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deaf spaces are unique indoor environments designed to optimize visual communication and Deaf cultural expression. However, much of the technological research geared towards the deaf involve use of video or wearables for American sign language (ASL) translation, with little consideration for Deaf perspective on privacy and usability of the technology. In contrast to video, RF sensors offer the avenue for ambient ASL recognition while also preserving privacy for Deaf signers. Methods: This paper investigates the RF transmit waveform parameters required for effective measurement of ASL signs and their effect on word-level classification accuracy attained with transfer learning and convolutional autoencoders (CAE). A multi-frequency fusion network is proposed to exploit data from all sensors in an RF sensor network and improve the recognition accuracy of fluent ASL signing. Results: For fluent signers, CAEs yield a 20-sign classification accuracy of %76 at 77 GHz and %73 at 24 GHz, while at X-band (10 Ghz) accuracy drops to 67%. For hearing imitation signers, signs are more separable, resulting in a 96% accuracy with CAEs. Further, fluent ASL recognition accuracy is significantly increased with use of the multi-frequency fusion network, which boosts the 20-sign fluent ASL recognition accuracy to 95%, surpassing conventional feature level fusion by 12%. Implications: Signing involves finer spatiotemporal dynamics than typical hand gestures, and thus requires interrogation with a transmit waveform that has a rapid succession of pulses and high bandwidth. Millimeter wave RF frequencies also yield greater accuracy due to the increased Doppler spread of the radar backscatter. Comparative analysis of articulation dynamics also shows that imitation signing is not representative of fluent signing, and not effective in pre-training networks for fluent ASL classification. Deep neural networks employing multi-frequency fusion capture both shared, as well as sensor-specific features and thus offer significant performance gains in comparison to using a single sensor or feature-level fusion.
引用
收藏
页码:11373 / 11381
页数:9
相关论文
共 50 条
  • [31] A Meliorated Multi-Frequency Band Pyroelectric Sensor
    Hsiao, Chun-Ching
    Liu, Sheng-Yi
    Siao, An-Shen
    SENSORS, 2015, 15 (07): : 16248 - 16264
  • [32] An Efficient Character-Level and Word-Level Feature Fusion Method for Chinese Text Classification
    Jin Wenzhen
    Zhu Hong
    Yang Guocai
    2019 3RD INTERNATIONAL CONFERENCE ON MACHINE VISION AND INFORMATION TECHNOLOGY (CMVIT 2019), 2019, 1229
  • [33] Creating word-level language models for large-vocabulary handwriting recognition
    John F. Pitrelli
    Amit Roy
    International Journal on Document Analysis and Recognition, 2003, 5 (2) : 126 - 137
  • [34] Word-Level Speech Dataset Creation for Sourashtra and Recognition System Using Kaldi
    Vancha, Punitha
    Nagarajan, Harshitha
    Inakollu, Vishnu Sai
    Gupta, Deepa
    Vekkot, Susmitha
    2022 IEEE 19TH INDIA COUNCIL INTERNATIONAL CONFERENCE, INDICON, 2022,
  • [35] Chinese Clinical Named Entity Recognition with Word-Level Information Incorporating Dictionaries
    Lu, Ningjie
    Zheng, Jun
    Wu, Wen
    Yang, Yan
    Chen, Kaiwei
    Hu, Wenxin
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [36] Sign Pose-based Transformer for Word-level Sign Language Recognition
    Bohacek, Matyas
    Hruz, Marek
    2022 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WORKSHOPS (WACVW 2022), 2022, : 182 - 191
  • [37] Real-Time Word-Level Temporal Segmentation in Streaming Speech Recognition
    Nishida, Naoto
    Hiraki, Hirotaka
    Rekimoto, Jun
    Ishiguro, Yoshio
    PROCEEDINGS OF THE 37TH ANNUAL ACM SYMPOSIUM ON USER INTERFACE SOFTWARE AND TECHNOLOGY, UIST ADJUNCT 2024, 2024,
  • [38] Word-Level Sign Language Recognition With Multi-Stream Neural Networks Focusing on Local Regions and Skeletal Information
    Maruyama, Mizuki
    Singh, Shrey
    Inoue, Katsufumi
    Pratim Roy, Partha
    Iwamura, Masakazu
    Yoshioka, Michifumi
    IEEE ACCESS, 2024, 12 : 167333 - 167346
  • [39] On the spatial and multi-frequency airborne ultrasonic image fusion
    Aiordachioaie, Dorel
    PROCEEDINGS OF THE 2015 7TH INTERNATIONAL CONFERENCE ON ELECTRONICS, COMPUTERS AND ARTIFICIAL INTELLIGENCE (ECAI), 2015, : E33 - E38
  • [40] A ghost imaging method based on multi-frequency fusion
    Ye, Hualong
    Kang, Yi
    Wang, Jian
    Zhang, Leihong
    Sun, Haojie
    Zhang, Dawei
    EUROPEAN PHYSICAL JOURNAL D, 2022, 76 (03):