An End-to-End Air Writing Recognition Method Based on Transformer

被引:0
|
作者
Tan, Xuhang [1 ]
Tong, Jicheng [1 ]
Matsumaru, Takafumi [1 ]
Dutta, Vibekananda [2 ]
He, Xin [1 ]
机构
[1] Waseda Univ, Grad Sch Informat Prod & Syst, Wakamatsu Ku, Kitakyushu, Fukuoka 8080135, Japan
[2] Warsaw Univ Technol, Inst Micromech & Photon, Fac Mechatron, PL-00661 Warsaw, Poland
基金
日本学术振兴会;
关键词
Writing; Character recognition; Task analysis; Visualization; Transformers; Trajectory; Data augmentation; Human computer interaction; Air writing recognition; transformer model; human-computer interaction (HCI);
D O I
10.1109/ACCESS.2023.3321807
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The air-writing recognition task entails the computer's ability to directly recognize and interpret user input generated by finger movements in the air. This form of interaction between humans and computers is considered natural, cost-effective, and immersive within the domain of human-computer interaction (HCI). While conventional air-writing recognition has primarily focused on recognizing individual characters, a recent advancement in 2022 introduced the concept of writing in the air (WiTA) to address continuous air-writing tasks. In this context, we assert that the Transformer-based approach can offer improved performance for the WiTA task. To solve the WiTA task, this study formulated an end-to-end air-writing recognition method called TR-AWR, which leverages the Transformer model. Our proposed method adopts a holistic approach by utilizing video frame sequences as input and generating letter sequences as outputs. To enhance the performance of the WiTA task, our method combines the vision transformer model with the traditional transformer model, while introducing data augmentation techniques for the first time. Our approach achieves a character error rate (CER) of 29.86% and a decoding frames per second (D-fps) value of 194.67 fps. Notably, our method outperforms the baseline models in terms of recognition accuracy while maintaining a certain level of real-time performance. The contributions of this paper are as follows: Firstly, this study is the first to incorporate the Transformer method into continuous air-writing recognition research, thereby reducing overall complexity and attaining improved results. Additionally, we adopt an end-to-end approach that streamlines the entire recognition process. Lastly, we propose specific data augmentation guidelines tailored explicitly for the WiTA task. In summary, our study presents a promising direction for effectively addressing the WiTA task and holds potential for further advancements in this domain.
引用
收藏
页码:109885 / 109898
页数:14
相关论文
共 50 条
  • [31] An end-to-end face recognition method with alignment learning
    Tang, Fenggao
    Wu, Xuedong
    Zhu, Zhiyu
    Wan, Zhengang
    Chang, Yanchao
    Du, Zhaoping
    Gu, Lili
    OPTIK, 2020, 205
  • [32] A NOVEL END-TO-END SPEECH EMOTION RECOGNITION NETWORK WITH STACKED TRANSFORMER LAYERS
    Wang, Xianfeng
    Wang, Min
    Qi, Wenbo
    Su, Wanqi
    Wang, Xiangqian
    Zhou, Huan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6289 - 6293
  • [33] SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer
    Xu, Zhanpeng
    Li, Jianhua
    Yang, Zhaopeng
    Li, Shiliang
    Li, Honglin
    JOURNAL OF CHEMINFORMATICS, 2022, 14 (01)
  • [34] Variable Scale Pruning for Transformer Model Compression in End-to-End Speech Recognition
    Ben Letaifa, Leila
    Rouas, Jean-Luc
    ALGORITHMS, 2023, 16 (09)
  • [35] End-to-End Neural Transformer Based Spoken Language Understanding
    Radfar, Martin
    Mouchtaris, Athanasios
    Kunzmann, Siegfried
    INTERSPEECH 2020, 2020, : 866 - 870
  • [36] SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer
    Zhanpeng Xu
    Jianhua Li
    Zhaopeng Yang
    Shiliang Li
    Honglin Li
    Journal of Cheminformatics, 14
  • [37] Transformer-based end-to-end speech recognition with residual Gaussian-based self-attention
    Liang, Chengdong
    Xu, Menglong
    Zhang, Xiao-Lei
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2021, 2 : 1495 - 1499
  • [38] Investigation of Transformer based Spelling Correction Model for CTC-based End-to-End Mandarin Speech Recognition
    Zhang, Shiliang
    Lei, Ming
    Yan, Zhijie
    INTERSPEECH 2019, 2019, : 2180 - 2184
  • [39] A CTC Alignment-Based Non-Autoregressive Transformer for End-to-End Automatic Speech Recognition
    Fan, Ruchao
    Chu, Wei
    Chang, Peng
    Alwan, Abeer
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 1436 - 1448
  • [40] An End-to-End Underwater Acoustic Target Recognition Model Based on One-Dimensional Convolution and Transformer
    Yang, Kang
    Wang, Biao
    Fang, Zide
    Cai, Banggui
    JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2024, 12 (10)