Multi-Frequency RF Sensor Fusion for Word-Level Fluent ASL Recognition

被引:9
|
作者
Gurbuz, Sevgi Z. [1 ]
Rahman, M. Mahbubur [1 ]
Kurtoglu, Emre [1 ]
Malaia, Evie [2 ]
Gurbuz, Ali Cafer [3 ]
Griffin, Darrin J. [4 ]
Crawford, Chris [5 ]
机构
[1] Univ Alabama, Dept Elect & Comp Engn, Tuscaloosa, AL 35487 USA
[2] Univ Alabama, Dept Commun Disorders, Tuscaloosa, AL 35487 USA
[3] Mississippi State Univ, Dept Elect & Comp Engn, Starkville, MS 39762 USA
[4] Univ Alabama, Dept Commun Studies, Tuscaloosa, AL 35487 USA
[5] Univ Alabama, Dept Comp Sci, Tuscaloosa, AL 35487 USA
基金
美国国家科学基金会;
关键词
Sensors; Radio frequency; Radar; Bandwidth; Auditory system; Time-frequency analysis; Sensor fusion; American sign language; gesture recognition; radar micro-Doppler; RF sensing; deep learning; autoencoders; DOPPLER RADAR; CLASSIFICATION;
D O I
10.1109/JSEN.2021.3078339
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deaf spaces are unique indoor environments designed to optimize visual communication and Deaf cultural expression. However, much of the technological research geared towards the deaf involve use of video or wearables for American sign language (ASL) translation, with little consideration for Deaf perspective on privacy and usability of the technology. In contrast to video, RF sensors offer the avenue for ambient ASL recognition while also preserving privacy for Deaf signers. Methods: This paper investigates the RF transmit waveform parameters required for effective measurement of ASL signs and their effect on word-level classification accuracy attained with transfer learning and convolutional autoencoders (CAE). A multi-frequency fusion network is proposed to exploit data from all sensors in an RF sensor network and improve the recognition accuracy of fluent ASL signing. Results: For fluent signers, CAEs yield a 20-sign classification accuracy of %76 at 77 GHz and %73 at 24 GHz, while at X-band (10 Ghz) accuracy drops to 67%. For hearing imitation signers, signs are more separable, resulting in a 96% accuracy with CAEs. Further, fluent ASL recognition accuracy is significantly increased with use of the multi-frequency fusion network, which boosts the 20-sign fluent ASL recognition accuracy to 95%, surpassing conventional feature level fusion by 12%. Implications: Signing involves finer spatiotemporal dynamics than typical hand gestures, and thus requires interrogation with a transmit waveform that has a rapid succession of pulses and high bandwidth. Millimeter wave RF frequencies also yield greater accuracy due to the increased Doppler spread of the radar backscatter. Comparative analysis of articulation dynamics also shows that imitation signing is not representative of fluent signing, and not effective in pre-training networks for fluent ASL classification. Deep neural networks employing multi-frequency fusion capture both shared, as well as sensor-specific features and thus offer significant performance gains in comparison to using a single sensor or feature-level fusion.
引用
收藏
页码:11373 / 11381
页数:9
相关论文
共 50 条
  • [1] Word-Level ASL Recognition and Trigger Sign Detection with RF Sensors
    Rahman, M. Mahbubur
    Kurtoglu, Emre
    Mdrafi, Robiulhossain
    Gurbuz, Ali C.
    Malaia, Evie
    Crawford, Chris
    Griffin, Darrin
    Gurbuz, Sevgi Z.
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8233 - 8237
  • [2] ASL Recognition Based on Kinematics Derived from a Multi-Frequency RF Sensor Network
    Gurbuz, Sevgi Z.
    Gurbuz, Ali C.
    Malaia, Evie A.
    Griffin, Darrin J.
    Crawford, Chris
    Kurtoglu, Emre
    Rahman, M. Mahbubur
    Aksu, Ridvan
    Mdrafi, Robiulhossain
    2020 IEEE SENSORS, 2020,
  • [3] WORD-LEVEL RECOGNITION OF CURSIVE SCRIPT
    FARAG, RFH
    IEEE TRANSACTIONS ON COMPUTERS, 1979, 28 (02) : 172 - 175
  • [4] Multi-Frequency RF Sensor Data Adaptation for Motion Recognition with Multi-Modal Deep Learning
    Rahman, M. Mahbubur
    Gurbuz, Sevgi Z.
    2021 IEEE RADAR CONFERENCE (RADARCONF21): RADAR ON THE MOVE, 2021,
  • [5] Word-level Speech Recognition with a Letter to Word Encoder
    Collobert, Ronan
    Hannun, Awni
    Synnaeve, Gabriel
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119, 2020, 119
  • [6] Word-level Speech Recognition with a Letter to Word Encoder
    Collobert, Ronan
    Hannun, Awni
    Synnaeve, Gabriel
    25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019), 2019,
  • [7] Detecting Depression with Word-Level Multimodal Fusion
    Rohanian, Morteza
    Hough, Julian
    Purver, Matthew
    INTERSPEECH 2019, 2019, : 1443 - 1447
  • [8] WISE: Word-Level Interaction-Based Multimodal Fusion for Speech Emotion Recognition
    Shen, Guang
    Lai, Riwei
    Chen, Rui
    Zhang, Yu
    Zhang, Kejia
    Han, Qilong
    Song, Hongtao
    INTERSPEECH 2020, 2020, : 369 - 373
  • [9] EFFECTS OF WORD-LEVEL AND SENTENCE-LEVEL CONTEXTS UPON WORD RECOGNITION
    COLOMBO, L
    WILLIAMS, J
    MEMORY & COGNITION, 1990, 18 (02) : 153 - 163
  • [10] An analytical handwritten word recognition system with word-level discriminant training
    Tay, YH
    Lallican, PM
    Khalid, M
    Knerr, S
    Viard-Gaudin, C
    SIXTH INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION, PROCEEDINGS, 2001, : 726 - 730