An End-to-End Air Writing Recognition Method Based on Transformer

被引：0

作者：

Tan, Xuhang ^{[1
]}

Tong, Jicheng ^{[1
]}

Matsumaru, Takafumi ^{[1
]}

Dutta, Vibekananda ^{[2
]}

He, Xin ^{[1
]}

机构：

[1] Waseda Univ, Grad Sch Informat Prod & Syst, Wakamatsu Ku, Kitakyushu, Fukuoka 8080135, Japan

[2] Warsaw Univ Technol, Inst Micromech & Photon, Fac Mechatron, PL-00661 Warsaw, Poland

来源：

IEEE ACCESS | 2023年 / 11卷

基金：

日本学术振兴会;

关键词：

Writing; Character recognition; Task analysis; Visualization; Transformers; Trajectory; Data augmentation; Human computer interaction; Air writing recognition; transformer model; human-computer interaction (HCI);

D O I：

10.1109/ACCESS.2023.3321807

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The air-writing recognition task entails the computer's ability to directly recognize and interpret user input generated by finger movements in the air. This form of interaction between humans and computers is considered natural, cost-effective, and immersive within the domain of human-computer interaction (HCI). While conventional air-writing recognition has primarily focused on recognizing individual characters, a recent advancement in 2022 introduced the concept of writing in the air (WiTA) to address continuous air-writing tasks. In this context, we assert that the Transformer-based approach can offer improved performance for the WiTA task. To solve the WiTA task, this study formulated an end-to-end air-writing recognition method called TR-AWR, which leverages the Transformer model. Our proposed method adopts a holistic approach by utilizing video frame sequences as input and generating letter sequences as outputs. To enhance the performance of the WiTA task, our method combines the vision transformer model with the traditional transformer model, while introducing data augmentation techniques for the first time. Our approach achieves a character error rate (CER) of 29.86% and a decoding frames per second (D-fps) value of 194.67 fps. Notably, our method outperforms the baseline models in terms of recognition accuracy while maintaining a certain level of real-time performance. The contributions of this paper are as follows: Firstly, this study is the first to incorporate the Transformer method into continuous air-writing recognition research, thereby reducing overall complexity and attaining improved results. Additionally, we adopt an end-to-end approach that streamlines the entire recognition process. Lastly, we propose specific data augmentation guidelines tailored explicitly for the WiTA task. In summary, our study presents a promising direction for effectively addressing the WiTA task and holds potential for further advancements in this domain.

引用

页码：109885 / 109898

页数：14

共 50 条

[1] Semantic Mask for Transformer based End-to-End Speech Recognition
Wang, Chengyi
Wu, Yu
Du, Yujiao
Li, Jinyu
Liu, Shujie
Lu, Liang
Ren, Shuo
Ye, Guoli
Zhao, Sheng
Zhou, Ming
INTERSPEECH 2020, 2020, : 971 - 975
[2] Transformer-based end-to-end scene text recognition
Zhu, Xinghao
Zhang, Zhi
PROCEEDINGS OF THE 2021 IEEE 16TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA 2021), 2021, : 1691 - 1695
[3] Towards End-to-End Face Recognition Method through Spatial Transformer
Wu, Yanfang
Lu, Xiaobo
Qin, Chen
5TH ANNUAL INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI2020), 2020, 1575
[4] AN END-TO-END SPEECH ACCENT RECOGNITION METHOD BASED ON HYBRID CTC/ATTENTION TRANSFORMER ASR
Gao, Qiang
Wu, Haiwei
Sun, Yanqing
Duan, Yitao
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 7253 - 7257
[5] A Transformer-Based End-to-End Automatic Speech Recognition Algorithm
Dong, Fang
Qian, Yiyang
Wang, Tianlei
Liu, Peng
Cao, Jiuwen
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1592 - 1596
[6] An End-to-End Depression Recognition Method Based on EEGNet
Liu, Bo
Chang, Hongli
Peng, Kang
Wang, Xuenan
FRONTIERS IN PSYCHIATRY, 2022, 13
[7] Online Compressive Transformer for End-to-End Speech Recognition
Leong, Chi-Hang
Huang, Yu-Han
Chien, Jen-Tzung
INTERSPEECH 2021, 2021, : 2082 - 2086
[8] Hardware Accelerator for Transformer based End-to-End Automatic Speech Recognition System
Yamini, Shaarada D.
Mirishkar, Ganesh S.
Vuppala, Anil Kumar
Purini, Suresh
2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, : 93 - 100
[9] An End-to-End Transformer-Based Automatic Speech Recognition for Qur?an Reciters
Hadwan, Mohammed
Alsayadi, Hamzah A.
AL-Hagree, Salah
CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 74 (02): : 3471 - 3487
[10] On-device Streaming Transformer-based End-to-End Speech Recognition
Oh, Yoo Rhee
Park, Kiyoung
INTERSPEECH 2021, 2021, : 967 - 968

← 1 2 3 4 5 →