Automatic Speech Recognition with Machine Learning: Techniques and Evaluation of Current Tools

被引:0
|
作者
Fayan R. [1 ]
Montajabi Z. [2 ]
Gonsalves R. [1 ]
机构
[1] Avid, United States
[2] Avid Technology, United States
来源
SMPTE Motion Imaging Journal | 2024年 / 133卷 / 02期
关键词
ARTIFICIAL INTELLIGENCE; AUTOMATIC SPEECH RECOGNITION; MACHINE LEARNING;
D O I
10.5594/JMI.2024/IPYX8877
中图分类号
学科分类号
摘要
This research offers an in-depth review of current Automatic Speech Recognition (ASR) methods and their significant impact on media production, with a focus on the transformer model's self-attention mechanism for understanding sequential relationships. It compares accuracy and performance of top ASR models like Meta's Multilingual Machine Speech, OpenAI's Whisper, and Google's Universal Speech Model along with services from Microsoft Azure, Amazon Web Services, and Google Cloud Platform. The study examines key ASR aspects, including voice activity detection, language identification, and multilanguage support, and evaluates their accuracy metrics. Challenges such as limited data for certain languages and complexities in linguistic nuances are highlighted. Additionally, the paper discusses ASR's role in media production, from creating time-based captions to transforming editing techniques. By analyzing the ASR process from audio preprocessing to post-processing, the research bridges academic and practical perspectives, enabling media producers to utilize advanced ASR technologies effectively. © 2002 Society of Motion Picture and Television Engineers, Inc.
引用
收藏
页码:48 / 57
页数:9
相关论文
共 50 条
  • [41] Overview of speech enhancement techniques for automatic speaker recognition
    OrtegaGarcia, J
    GonzalezRodriguez, J
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 929 - 932
  • [42] Automatic speech recognition systems: A survey of discriminative techniques
    Kaur, Amrit Preet
    Singh, Amitoj
    Sachdeva, Rohit
    Kukreja, Vinay
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (09) : 13307 - 13339
  • [43] AUTOMATIC SPEAKER AUTHENTICATION USING SPEECH RECOGNITION TECHNIQUES
    MEEKER, WF
    MARTIN, TB
    HERSCHER, MB
    PHYFE, D
    WEINSTOCK, M
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1967, 42 (05): : 1182 - &
  • [44] AUTOMATIC SPEECH RECOGNITION IN MACHINE-AIDED TRANSLATION
    BROWN, PF
    CHEN, SF
    DELLAPIETRA, SA
    DELLAPIETRA, VJ
    KEHLER, AS
    MERCER, RL
    COMPUTER SPEECH AND LANGUAGE, 1994, 8 (03): : 177 - 187
  • [45] Fusion of speech techniques for automatic environmental sound recognition
    Olteanu, Elena
    Miu, Delia Oana
    Drosu, Alexandru
    Segarceanu, Svetlana
    Suciu, George
    Gavat, Inge
    2019 10TH INTERNATIONAL CONFERENCE ON SPEECH TECHNOLOGY AND HUMAN-COMPUTER DIALOGUE (SPED), 2019,
  • [46] Applying Nonlinear Techniques for an Automatic Speech Recognition System
    Schiopu, Daniela
    NONLINEAR DYNAMICS OF ELECTRONIC SYSTEMS, 2014, 438 : 371 - 378
  • [47] MACHINE RECOGNITION OF HUMAN LANGUAGE .I. AUTOMATIC SPEECH RECOGNITION
    LINDGREN, N
    IEEE SPECTRUM, 1965, 2 (03) : 114 - +
  • [48] Evaluation of Wains as a Classifier for Automatic Speech Recognition
    Salaja, Rosemary T.
    Flynn, Ronan
    Russell, Michael
    2015 26TH IRISH SIGNALS AND SYSTEMS CONFERENCE (ISSC), 2015,
  • [49] On the Evaluation of Automatic Program Repair Techniques and Tools
    Khalilian, Alireza
    Baraani-Dastjerdi, Ahmad
    Zamani, Bahman
    2016 24TH IRANIAN CONFERENCE ON ELECTRICAL ENGINEERING (ICEE), 2016, : 61 - 66
  • [50] Emotion Recognition System via Facial Expressions and Speech Using Machine Learning and Deep Learning Techniques
    Chaudhari A.
    Bhatt C.
    Nguyen T.T.
    Patel N.
    Chavda K.
    Sarda K.
    SN Computer Science, 4 (4)