Real-time translation of English speech through speech feature extraction

被引:0
|
作者
Lei, Xiaoyan [1 ]
机构
[1] Henan Mech & Elect Vocat Coll, 1 Taishan Rd, Zhengzhou 451191, Henan, Peoples R China
关键词
Speech feature; English speech; Real-time translation; Transformer;
D O I
10.1007/s10015-024-00951-w
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Real-time English speech translation is useful in numerous situations, including business and travel. The goal of this research is to improve real-time English speech translation efficacy. Initially, filter bank (FBank) features were extracted from English speech. Subsequently, an enhanced Transformer model was introduced, incorporating a causal convolution module in the front end of the encoder to capture English speech features with location information. The performance of the optimized model in translating English speech to different target languages was tested using the MuST-C dataset. The results revealed differences in translation results for different target languages using the improved Transformer. The highest bilingual evaluation understudy (BLEU) score was observed for Spanish text at 20.84, while Russian text obtained the lowest score of 10.56. The average BLEU score was 18.51, with an average lag time delay of 1202.33 ms. Compared to the conventional Transformer model, the improved model exhibited higher BLEU scores, lower time delay, and optimal performance when utilizing a convolutional kernel size of 3 x 3. The results demonstrate the dependability of the improved Transformer model in real-time English speech translation, highlighting its practical usefulness.
引用
收藏
页码:410 / 415
页数:6
相关论文
共 50 条
  • [31] A REAL-TIME SPEECH DIALOG SYSTEM USING SPONTANEOUS SPEECH UNDERSTANDING
    TAKEBAYASHI, Y
    TSUBOI, H
    KANAZAWA, H
    SADAMOTO, Y
    HASHIMOTO, H
    SHINCHI, H
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 1993, E76D (01) : 112 - 120
  • [32] Towards a Real-Time Speech Neuroprosthesis to Restore Speech in a Person With ALS
    Wairagkar, Maitreyee
    Card, Nicholas
    Iacobacci, Carrina
    Hochberg, Leigh R.
    Stavisky, Sergey
    Brandman, David M.
    NEUROSURGERY, 2025, 71 : 58 - 58
  • [33] Real-time lexical competitions during speech-in-speech comprehension
    Boulenger, Veronique
    Hoen, Michel
    Pellegrino, Francois
    Meunier, Fanny
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1839 - +
  • [34] Optimization and Performance Analysis of Real-time Speech Translation Systems Based on Mobile Technology
    Yang, Ning
    International Journal of Interactive Mobile Technologies, 2024, 18 (23) : 57 - 71
  • [35] Design of Japanese Speech Recognition and Real-Time Translation System Based on Deep Learning
    Zhang, Xuanxuan
    Lecture Notes in Electrical Engineering, 1243 LNEE : 227 - 235
  • [36] RTTS: Towards Enterprise-level Real-Time Speech Transcription and Translation Services
    Huerta, Juan M.
    Wu, Cheng
    Sakrajda, Andrej
    Caskey, Sasha
    Jan, Ea-Ee
    Faisman, Alexander
    Ben-David, Shai
    Liu, Wen
    Lee, Antonio
    Stewart, Osamuyimen
    Frissora, Michael
    Lubensky, David
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 436 - 439
  • [37] Visual speech feature extraction for improved speech recognition
    Zhang, X
    Mersereau, RM
    Clements, M
    Broun, CC
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 1993 - 1996
  • [38] Real-Time Speech Signal Segmentation Methods
    Kupryjanow, Adam
    Czyzewski, Andrzej
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 2013, 61 (7-8): : 521 - 534
  • [39] REAL-TIME SPEECH SYNTHESIS - DEVELOPMENT AND EMPLOYMENT
    OTT, A
    SIIL, I
    COMPUTERS AND ARTIFICIAL INTELLIGENCE, 1987, 6 (02): : 173 - 180
  • [40] Real-time speech signal segmentation methods
    2013, Audio Engineering Society (61): : 7 - 8