CONVOLUTION-BASED ATTENTION MODEL WITH POSITIONAL ENCODING FOR STREAMING SPEECH RECOGNITION ON EMBEDDED DEVICES

被引：4

作者：

Park, Jinhwan ^{[1
]}

Kim, Chanwoo ^{[2
]}

Sung, Wonyong ^{[1
]}

机构：

[1] Seoul Natl Univ, Seoul, South Korea

[2] Samsung Res, Seoul, South Korea

来源：

2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) | 2021年

基金：

新加坡国家研究基金会;

关键词：

D O I：

10.1109/SLT48900.2021.9383583

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

On-device automatic speech recognition (ASR) is much more preferred over server-based implementations owing to its low latency and privacy protection. Many server-based ASRs employ recurrent neural networks (RNNs) to exploit their ability to recognize long sequences with a limited number of states; however, they are inefficient for single-stream implementations in embedded devices. In this study, a highly efficient convolutional model-based ASR with monotonic chunkwise attention is developed. Although temporal convolution-based models allow more efficient implementations, they demand a long filter-length to avoid looping or skipping problems. To remedy this problem, we add positional encoding, while shortening the filter length, to a convolution-based ASR encoder. It is demonstrated that the accuracy of the short filter-length convolutional model is significantly improved. In addition, the effect of positional encoding is analyzed by visualizing the attention energy and encoder outputs. The proposed model achieves the word error rate of 11.20% on TED-LIUMv2 for an end-to-end speech recognition task.

引用

页码：30 / 37

页数：8

共 50 条

[41] Light-weight residual convolution-based capsule network for EEG emotion recognition
Fan, Cunhang
Wang, Jinqin
Huang, Wei
Yang, Xiaoke
Pei, Guangxiong
Li, Taihao
Lv, Zhao
Advanced Engineering Informatics, 2024, 61
[42] ABConv: Attention Based Convolution for Automatic Modulation Recognition
Guo, Chengyu
Han, Shuai
Meng, Weixiao
Li, Cheng
2024 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS, ICC WORKSHOPS 2024, 2024, : 1586 - 1591
[43] Light-weight residual convolution-based capsule network for EEG emotion recognition
Fan, Cunhang
Wang, Jinqin
Huang, Wei
Yang, Xiaoke
Pei, Guangxiong
Li, Taihao
Lv, Zhao
ADVANCED ENGINEERING INFORMATICS, 2024, 61
[44] Causal dilated Convolution-Based residual DenseNet with channel attention for RUL prediction of rolling bearings
Li, Jimeng
Ding, Wanmeng
Mao, Weilin
Zhang, Jinfeng
Meng, Zong
Tong, Kai
MEASUREMENT, 2024, 235
[45] A convolution-based distance measure for fuzzy singletons and its application in a pattern recognition problem
Naranjo, Rodrigo
Santos, Matilde
Garmendia, Luis
INTEGRATED COMPUTER-AIDED ENGINEERING, 2021, 28 (01) : 51 - 63
[46] Residual network based on convolution attention model and feature fusion for dance motion recognition
Shen, Dianhuai
Jiang, Xueying
Teng, Lin
EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2022, 9 (04):
[47] Action Recognition Model Based on 3D Graph Convolution and Attention Enhanced
Cao Yi
Liu Chen
Sheng Yongjian
Huang Zilong
Deng Xiaolong
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (07) : 2071 - 2078
[48] Sentiment Classification Model Based on Non- Negative Sinusoidal Positional Encoding and Hybrid Attention Mechanism
Zheng, Zhichao
Chen, Jindong
Zhang, Jian
Computer Engineering and Applications, 60 (15): : 101 - 110
[49] Research on Speech Recognition Based on Embedded Platform
Lv, Xiao-Min
Qiu, Xiao-Mei
Fang, Xu-Qi
Ma, An-Jun
Cai, Yi-Jie
International Conference on Mechanics, Building Material and Civil Engineering (MBMCE 2015), 2015, : 698 - 703
[50] An attention mechanism model based on positional encoding for the prediction of ship maneuvering motion in real sea state
Dong, Lei
Wang, Hongdong
Lou, Jiankun
JOURNAL OF MARINE SCIENCE AND TECHNOLOGY, 2024, 29 (01) : 136 - 152

← 1 2 3 4 5 →