CONVOLUTION-BASED ATTENTION MODEL WITH POSITIONAL ENCODING FOR STREAMING SPEECH RECOGNITION ON EMBEDDED DEVICES

被引:4
|
作者
Park, Jinhwan [1 ]
Kim, Chanwoo [2 ]
Sung, Wonyong [1 ]
机构
[1] Seoul Natl Univ, Seoul, South Korea
[2] Samsung Res, Seoul, South Korea
来源
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT) | 2021年
基金
新加坡国家研究基金会;
关键词
D O I
10.1109/SLT48900.2021.9383583
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
On-device automatic speech recognition (ASR) is much more preferred over server-based implementations owing to its low latency and privacy protection. Many server-based ASRs employ recurrent neural networks (RNNs) to exploit their ability to recognize long sequences with a limited number of states; however, they are inefficient for single-stream implementations in embedded devices. In this study, a highly efficient convolutional model-based ASR with monotonic chunkwise attention is developed. Although temporal convolution-based models allow more efficient implementations, they demand a long filter-length to avoid looping or skipping problems. To remedy this problem, we add positional encoding, while shortening the filter length, to a convolution-based ASR encoder. It is demonstrated that the accuracy of the short filter-length convolutional model is significantly improved. In addition, the effect of positional encoding is analyzed by visualizing the attention energy and encoder outputs. The proposed model achieves the word error rate of 11.20% on TED-LIUMv2 for an end-to-end speech recognition task.
引用
收藏
页码:30 / 37
页数:8
相关论文
共 50 条
  • [41] Light-weight residual convolution-based capsule network for EEG emotion recognition
    Fan, Cunhang
    Wang, Jinqin
    Huang, Wei
    Yang, Xiaoke
    Pei, Guangxiong
    Li, Taihao
    Lv, Zhao
    Advanced Engineering Informatics, 2024, 61
  • [42] ABConv: Attention Based Convolution for Automatic Modulation Recognition
    Guo, Chengyu
    Han, Shuai
    Meng, Weixiao
    Li, Cheng
    2024 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS, ICC WORKSHOPS 2024, 2024, : 1586 - 1591
  • [43] Light-weight residual convolution-based capsule network for EEG emotion recognition
    Fan, Cunhang
    Wang, Jinqin
    Huang, Wei
    Yang, Xiaoke
    Pei, Guangxiong
    Li, Taihao
    Lv, Zhao
    ADVANCED ENGINEERING INFORMATICS, 2024, 61
  • [44] Causal dilated Convolution-Based residual DenseNet with channel attention for RUL prediction of rolling bearings
    Li, Jimeng
    Ding, Wanmeng
    Mao, Weilin
    Zhang, Jinfeng
    Meng, Zong
    Tong, Kai
    MEASUREMENT, 2024, 235
  • [45] A convolution-based distance measure for fuzzy singletons and its application in a pattern recognition problem
    Naranjo, Rodrigo
    Santos, Matilde
    Garmendia, Luis
    INTEGRATED COMPUTER-AIDED ENGINEERING, 2021, 28 (01) : 51 - 63
  • [46] Residual network based on convolution attention model and feature fusion for dance motion recognition
    Shen, Dianhuai
    Jiang, Xueying
    Teng, Lin
    EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2022, 9 (04):
  • [47] Action Recognition Model Based on 3D Graph Convolution and Attention Enhanced
    Cao Yi
    Liu Chen
    Sheng Yongjian
    Huang Zilong
    Deng Xiaolong
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2021, 43 (07) : 2071 - 2078
  • [48] Sentiment Classification Model Based on Non- Negative Sinusoidal Positional Encoding and Hybrid Attention Mechanism
    Zheng, Zhichao
    Chen, Jindong
    Zhang, Jian
    Computer Engineering and Applications, 60 (15): : 101 - 110
  • [49] Research on Speech Recognition Based on Embedded Platform
    Lv, Xiao-Min
    Qiu, Xiao-Mei
    Fang, Xu-Qi
    Ma, An-Jun
    Cai, Yi-Jie
    International Conference on Mechanics, Building Material and Civil Engineering (MBMCE 2015), 2015, : 698 - 703
  • [50] An attention mechanism model based on positional encoding for the prediction of ship maneuvering motion in real sea state
    Dong, Lei
    Wang, Hongdong
    Lou, Jiankun
    JOURNAL OF MARINE SCIENCE AND TECHNOLOGY, 2024, 29 (01) : 136 - 152