An End-to-End Air Writing Recognition Method Based on Transformer

被引:0
|
作者
Tan, Xuhang [1 ]
Tong, Jicheng [1 ]
Matsumaru, Takafumi [1 ]
Dutta, Vibekananda [2 ]
He, Xin [1 ]
机构
[1] Waseda Univ, Grad Sch Informat Prod & Syst, Wakamatsu Ku, Kitakyushu, Fukuoka 8080135, Japan
[2] Warsaw Univ Technol, Inst Micromech & Photon, Fac Mechatron, PL-00661 Warsaw, Poland
基金
日本学术振兴会;
关键词
Writing; Character recognition; Task analysis; Visualization; Transformers; Trajectory; Data augmentation; Human computer interaction; Air writing recognition; transformer model; human-computer interaction (HCI);
D O I
10.1109/ACCESS.2023.3321807
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The air-writing recognition task entails the computer's ability to directly recognize and interpret user input generated by finger movements in the air. This form of interaction between humans and computers is considered natural, cost-effective, and immersive within the domain of human-computer interaction (HCI). While conventional air-writing recognition has primarily focused on recognizing individual characters, a recent advancement in 2022 introduced the concept of writing in the air (WiTA) to address continuous air-writing tasks. In this context, we assert that the Transformer-based approach can offer improved performance for the WiTA task. To solve the WiTA task, this study formulated an end-to-end air-writing recognition method called TR-AWR, which leverages the Transformer model. Our proposed method adopts a holistic approach by utilizing video frame sequences as input and generating letter sequences as outputs. To enhance the performance of the WiTA task, our method combines the vision transformer model with the traditional transformer model, while introducing data augmentation techniques for the first time. Our approach achieves a character error rate (CER) of 29.86% and a decoding frames per second (D-fps) value of 194.67 fps. Notably, our method outperforms the baseline models in terms of recognition accuracy while maintaining a certain level of real-time performance. The contributions of this paper are as follows: Firstly, this study is the first to incorporate the Transformer method into continuous air-writing recognition research, thereby reducing overall complexity and attaining improved results. Additionally, we adopt an end-to-end approach that streamlines the entire recognition process. Lastly, we propose specific data augmentation guidelines tailored explicitly for the WiTA task. In summary, our study presents a promising direction for effectively addressing the WiTA task and holds potential for further advancements in this domain.
引用
收藏
页码:109885 / 109898
页数:14
相关论文
共 50 条
  • [41] An End-to-End Video Coding Method via Adaptive Vision Transformer
    Yang, Haoyan
    Zhou, Mingliang
    Shang, Zhaowei
    Pu, Huayan
    Luo, Jun
    Huang, Xiaoxu
    Wang, Shilong
    Cao, Huajun
    Wei, Xuekai
    Xian, Weizhi
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (01)
  • [42] Improving Transformer Based End-to-End Code-Switching Speech Recognition Using Language Identification
    Huang, Zheying
    Wang, Pei
    Wang, Jian
    Miao, Haoran
    Xu, Ji
    Zhang, Pengyuan
    APPLIED SCIENCES-BASEL, 2021, 11 (19):
  • [43] A Multi-level Acoustic Feature Extraction Framework for Transformer Based End-to-End Speech Recognition
    Li, Jin
    Su, Rongfeng
    Xie, Xurong
    Yan, Nan
    Wang, Lan
    INTERSPEECH 2022, 2022, : 3173 - 3177
  • [44] Towards multilingual end-to-end speech recognition for air traffic control
    Lin, Yi
    Yang, Bo
    Guo, Dongyue
    Fan, Peng
    IET INTELLIGENT TRANSPORT SYSTEMS, 2021, 15 (09) : 1203 - 1214
  • [45] HyperSFormer: A Transformer-Based End-to-End Hyperspectral Image Classification Method for Crop Classification
    Xie, Jiaxing
    Hua, Jiajun
    Chen, Shaonan
    Wu, Peiwen
    Gao, Peng
    Sun, Daozong
    Lyu, Zhendong
    Lyu, Shilei
    Xue, Xiuyun
    Lu, Jianqiang
    REMOTE SENSING, 2023, 15 (14)
  • [46] AMTT: An End-to-End Anchor-Based Multi-Scale Transformer Tracking Method
    Zheng, Yitao
    Deng, Honggui
    Xu, Qiguo
    Li, Ni
    ELECTRONICS, 2024, 13 (14)
  • [47] End-to-end Tangut character database building and recognition method
    Ma, Jinlin
    Cao, Yunrui
    Ma, Ziping
    Wei, Lin
    Hao, Chaohua
    IET IMAGE PROCESSING, 2022, 16 (08) : 2087 - 2100
  • [48] Research on End-To-End Nested Named Entity Recognition Method
    Deng, Liyuan
    Chen, Yanpin
    Wu, Yuefei
    Qin, Yongbin
    Huang, Ruizhang
    Zheng, Qinghua
    Tan, Xi
    Computer Engineering and Applications, 2023, 59 (07) : 278 - 284
  • [49] An End-to-End Formula Recognition Method Integrated Attention Mechanism
    Zhou, Mingle
    Cai, Ming
    Li, Gang
    Li, Min
    MATHEMATICS, 2023, 11 (01)
  • [50] An End-to-End Classifier Based on CNN for In-Air Handwritten-Chinese-Character Recognition
    Hu, Mianjun
    Qu, Xiwen
    Huang, Jun
    Wu, Xuangou
    APPLIED SCIENCES-BASEL, 2022, 12 (14):