Construction of an Online Cloud Platform for Zhuang Speech Recognition and Translation with Edge-Computing-Based Deep Learning Algorithm

被引:1
|
作者
Fan, Zeping [1 ,2 ]
Huang, Min [1 ,2 ]
Zhang, Xuejun [1 ,2 ,3 ]
Liu, Rongqi [1 ,2 ]
Lyu, Xinyi [1 ]
Duan, Taisen [1 ,2 ]
Bu, Zhaohui [4 ]
Liang, Jianghua [5 ]
机构
[1] Guangxi Univ, Sch Comp & Elect & Informat, Nanning 530004, Peoples R China
[2] Guangxi Univ, Guangxi Key Lab Multimedia Commun & Network Techno, Nanning 530004, Peoples R China
[3] Guangxi Big White & Little Black Robots Co Ltd, Nanning 530007, Peoples R China
[4] Guangxi Univ, Sch Foreign Language, Nanning 530004, Peoples R China
[5] Guangxi Univ, Sch Journalism & Commun, Nanning 530004, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 22期
关键词
automatic speech recognition; natural language processing; neural machine translation; transformer; cloud edge computing; network programming;
D O I
10.3390/app132212184
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The Zhuang ethnic minority in China possesses its own ethnic language and no ethnic script. Cultural exchange and transmission encounter hurdles as the Zhuang rely exclusively on oral communication. An online cloud-based platform was required to enhance linguistic communication. First, a database of 200 h of annotated Zhuang speech was created by collecting standard Zhuang speeches and improving database quality by removing transcription inconsistencies and text normalization. Second, SAformerNet, a more efficient and accurate transformer-based automatic speech recognition (ASR) network, is achieved by inserting additional downsampling modules. Subsequently, a Neural Machine Translation (NMT) model for translating Zhuang into other languages is constructed by fine-tuning the BART model and corpus filtering strategy. Finally, for the network's responsiveness to real-world needs, edge-computing techniques are applied to relieve network bandwidth pressure. An edge-computing private cloud system based on FPGA acceleration is proposed to improve model operation efficiency. Experiments show that the most critical metric of the system, model accuracy, is above 93%, and inference time is reduced by 29%. The computational delay for multi-head self-attention (MHSA) and feed-forward network (FFN) modules has been reduced by 7.1 and 1.9 times, respectively, and terminal response time is accelerated by 20% on average. Generally, the scheme provides a prototype tool for small-scale Zhuang remote natural language tasks in mountainous areas.
引用
收藏
页数:19
相关论文
共 50 条