An End-to-End Air Writing Recognition Method Based on Transformer

被引:0
|
作者
Tan, Xuhang [1 ]
Tong, Jicheng [1 ]
Matsumaru, Takafumi [1 ]
Dutta, Vibekananda [2 ]
He, Xin [1 ]
机构
[1] Waseda Univ, Grad Sch Informat Prod & Syst, Wakamatsu Ku, Kitakyushu, Fukuoka 8080135, Japan
[2] Warsaw Univ Technol, Inst Micromech & Photon, Fac Mechatron, PL-00661 Warsaw, Poland
基金
日本学术振兴会;
关键词
Writing; Character recognition; Task analysis; Visualization; Transformers; Trajectory; Data augmentation; Human computer interaction; Air writing recognition; transformer model; human-computer interaction (HCI);
D O I
10.1109/ACCESS.2023.3321807
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The air-writing recognition task entails the computer's ability to directly recognize and interpret user input generated by finger movements in the air. This form of interaction between humans and computers is considered natural, cost-effective, and immersive within the domain of human-computer interaction (HCI). While conventional air-writing recognition has primarily focused on recognizing individual characters, a recent advancement in 2022 introduced the concept of writing in the air (WiTA) to address continuous air-writing tasks. In this context, we assert that the Transformer-based approach can offer improved performance for the WiTA task. To solve the WiTA task, this study formulated an end-to-end air-writing recognition method called TR-AWR, which leverages the Transformer model. Our proposed method adopts a holistic approach by utilizing video frame sequences as input and generating letter sequences as outputs. To enhance the performance of the WiTA task, our method combines the vision transformer model with the traditional transformer model, while introducing data augmentation techniques for the first time. Our approach achieves a character error rate (CER) of 29.86% and a decoding frames per second (D-fps) value of 194.67 fps. Notably, our method outperforms the baseline models in terms of recognition accuracy while maintaining a certain level of real-time performance. The contributions of this paper are as follows: Firstly, this study is the first to incorporate the Transformer method into continuous air-writing recognition research, thereby reducing overall complexity and attaining improved results. Additionally, we adopt an end-to-end approach that streamlines the entire recognition process. Lastly, we propose specific data augmentation guidelines tailored explicitly for the WiTA task. In summary, our study presents a promising direction for effectively addressing the WiTA task and holds potential for further advancements in this domain.
引用
收藏
页码:109885 / 109898
页数:14
相关论文
共 50 条
  • [21] End-to-end information fusion method for transformer-based stereo matching
    Xu, Zhenghui
    Wang, Jingxue
    Guo, Jun
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2024, 35 (06)
  • [22] Transformer Based End-to-End Mispronunciation Detection and Diagnosis
    Wu, Minglin
    Li, Kun
    Leung, Wai-Kim
    Meng, Helen
    INTERSPEECH 2021, 2021, : 3954 - 3958
  • [23] End-to-End Transformer Based Model for Image Captioning
    Wang, Yiyu
    Xu, Jungang
    Sun, Yingfei
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 2585 - 2594
  • [24] Transformer Model Compression for End-to-End Speech Recognition on Mobile Devices
    Ben Letaifa, Leila
    Rouas, Jean-Luc
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 439 - 443
  • [25] Simple Data Augmented Transformer End-To-End Tibetan Speech Recognition
    Yang, Xiaodong
    Wang, Weizhe
    Yang, Hongwu
    Jiang, Jiaolong
    2020 IEEE 3RD INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP 2020), 2020, : 148 - 152
  • [26] Spatial-temporal transformer for end-to-end sign language recognition
    Cui, Zhenchao
    Zhang, Wenbo
    Li, Zhaoxin
    Wang, Zhaoqi
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (04) : 4645 - 4656
  • [27] TSDet: End-to-End Method with Transformer for SAR Ship Detection
    Chen, Yanyu
    Xia, Zhihao
    Liu, Jian
    Wu, Chenwei
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [28] FiLM Conditioning with Enhanced Feature to the Transformer-based End-to-End Noisy Speech Recognition
    Yang, Da-Hee
    Chang, Joon-Hyuk
    INTERSPEECH 2022, 2022, : 4098 - 4102
  • [29] An Empirical Study on Transformer-Based End-to-End Speech Recognition with Novel Decoder Masking
    Weng, Shi-Yan
    Chiu, Hsuan-Sheng
    Chen, Berlin
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 518 - 522
  • [30] End-to-end automated speech recognition using a character based small scale transformer architecture
    Loubser, Alexander
    De Villiers, Pieter
    De Freitas, Allan
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 252