Real-time translation of English speech through speech feature extraction

被引:0
|
作者
Lei, Xiaoyan [1 ]
机构
[1] Henan Mech & Elect Vocat Coll, 1 Taishan Rd, Zhengzhou 451191, Henan, Peoples R China
关键词
Speech feature; English speech; Real-time translation; Transformer;
D O I
10.1007/s10015-024-00951-w
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Real-time English speech translation is useful in numerous situations, including business and travel. The goal of this research is to improve real-time English speech translation efficacy. Initially, filter bank (FBank) features were extracted from English speech. Subsequently, an enhanced Transformer model was introduced, incorporating a causal convolution module in the front end of the encoder to capture English speech features with location information. The performance of the optimized model in translating English speech to different target languages was tested using the MuST-C dataset. The results revealed differences in translation results for different target languages using the improved Transformer. The highest bilingual evaluation understudy (BLEU) score was observed for Spanish text at 20.84, while Russian text obtained the lowest score of 10.56. The average BLEU score was 18.51, with an average lag time delay of 1202.33 ms. Compared to the conventional Transformer model, the improved model exhibited higher BLEU scores, lower time delay, and optimal performance when utilizing a convolutional kernel size of 3 x 3. The results demonstrate the dependability of the improved Transformer model in real-time English speech translation, highlighting its practical usefulness.
引用
收藏
页码:410 / 415
页数:6
相关论文
共 50 条
  • [41] Speech interfaces in real-time control systems
    Cook, MJ
    Angus, C
    Campbell, C
    Cranmer, C
    PEOPLE IN CONTROL, 1999, (463): : 428 - 433
  • [42] Analysis of speech production real-time MRI
    Ramanarayanan, Vikram
    Tilsen, Sam
    Proctor, Michael
    Toger, Johannes
    Goldstein, Louis
    Nayak, Krishna S.
    Narayanan, Shrikanth
    COMPUTER SPEECH AND LANGUAGE, 2018, 52 : 1 - 22
  • [43] INTEGRATING MACHINE TRANSLATION AND SPEECH SYNTHESIS COMPONENT FOR ENGLISH TO DRAVIDIAN LANGUAGE SPEECH TO SPEECH TRANSLATION SYSTEM
    Sangeetha, J.
    Jothilakshmi, S.
    JOURNAL OF ENGINEERING SCIENCE AND TECHNOLOGY, 2015, 10 (02): : 196 - 211
  • [44] Real-time MRI and articulatory coordination in speech
    Demolin, D
    Hassid, S
    Metens, T
    Soquet, A
    COMPTES RENDUS BIOLOGIES, 2002, 325 (04) : 547 - 556
  • [45] A real-time speech quality improvement system
    Zhao, HA
    ETFA 2003: IEEE CONFERENCE ON EMERGING TECHNOLOGIES AND FACTORY AUTOMATION, VOL 1, PROCEEDINGS, 2003, : 491 - 495
  • [46] Real-time Speech Summarization for Medical Conversations
    Khai Le-Duc
    Khai-Nguyen Nguyen
    Long Vo-Dang
    Truong-Son Hy
    INTERSPEECH 2024, 2024, : 1960 - 1964
  • [47] A Real-Time Scene Text to Speech System
    Neumann, Lukas
    Matas, Jiri
    COMPUTER VISION - ECCV 2012, PT III, 2012, 7585 : 619 - 622
  • [48] Real-time discrimination of broadcast speech/music
    Saunders, J
    1996 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, CONFERENCE PROCEEDINGS, VOLS 1-6, 1996, : 993 - 996
  • [49] A FLEXIBLE ARCHITECTURE FOR REAL-TIME SPEECH RECOGNITION
    MORENO, F
    ALEXANDRES, S
    MENESES, J
    MICROPROCESSING AND MICROPROGRAMMING, 1993, 37 (1-5): : 69 - 72
  • [50] A real-time speech-music discriminator
    Aarts, RM
    Dekkers, RT
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 1999, 47 (09): : 720 - 725