A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering

被引:0
|
作者
Faria Nazir
Muhammad Nadeem Majeed
Mustansar Ali Ghazanfar
Muazzam Maqsood
机构
[1] University of Engineering and Technology Taxila,Department of Software Engineering
[2] University of the Punjab,Department of Data Science
[3] University of East London,School of Architecture, Computing and Engineering
[4] COMSATS University Islamabad,Department of Computer Science
来源
Multimedia Systems | 2023年 / 29卷
关键词
Speech analytics; Deep convolutional neural network; Multimedia tools; Deep clustering; Phone variation model;
D O I
暂无
中图分类号
学科分类号
摘要
Nowadays, the demand for language learning is increasing because people need to communicate with other people belonging to different regions for their business deals, study, etc. During language learning, a lot of pronunciation mistakes occur due to unfamiliarity with a new language and differences in accent. In this paper, we perform speech mistakes analysis using deep feature-based clustering. We proposed two novel methods for speech analysis, one to deal with phonemic errors (confusing phonemes) and the other to deal with the prosodic errors (partially changed pronunciation variation of phones). For accurate and efficient language learning, it is important to learn both phonemic as well as prosodic error corrections. In our first method, we perform speech analysis by combining deep CNN features and clustering algorithm to detect the phonemic errors. We classify the phonemes using K-nearest neighbor, Naïve Bayes, and support vector machine (SVM). We perform experiments on the six most frequently mispronounced confusing pairs of Arabic to handle phonemic errors and achieve an accuracy of 94%. In our second method, we proposed the unsupervised phone variation model (PVM) to detect prosodic errors. In PVM, each phone is extended to represent the different types of pronunciation variation of that phone with different proficiency levels. We use an Arabic dataset of 28 individual phones for speech analysis and provide feedback based on the variation of each phone and achieves an accuracy of 97%.
引用
收藏
页码:1699 / 1715
页数:16
相关论文
共 50 条
  • [1] A computer-aided speech analytics approach for pronunciation feedback using deep feature clustering
    Nazir, Faria
    Majeed, Muhammad Nadeem
    Ghazanfar, Mustansar Ali
    Maqsood, Muazzam
    MULTIMEDIA SYSTEMS, 2023, 29 (03) : 1699 - 1715
  • [2] Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training
    Meng, Fanbo
    Wu, Zhiyong
    Jia, Jia
    Meng, Helen
    Cai, Lianhong
    MULTIMEDIA TOOLS AND APPLICATIONS, 2014, 73 (01) : 463 - 489
  • [3] Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training
    Fanbo Meng
    Zhiyong Wu
    Jia Jia
    Helen Meng
    Lianhong Cai
    Multimedia Tools and Applications, 2014, 73 : 463 - 489
  • [4] HMM-BASED EMPHATIC SPEECH SYNTHESIS FOR CORRECTIVE FEEDBACK IN COMPUTER-AIDED PRONUNCIATION TRAINING
    Ning, Yishuang
    Wu, Zhiyong
    Jia, Jia
    Meng, Fanbo
    Meng, Helen
    Cai, Lianhong
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4934 - 4938
  • [5] EduSpeak®: A speech recognition and pronunciation scoring toolkit for computer-aided language learning applications
    Franco, Horacio
    Bratt, Harry
    Rossier, Romain
    Gadde, Venkata Rao
    Shriberg, Elizabeth
    Abrash, Victor
    Precoda, Kristin
    LANGUAGE TESTING, 2010, 27 (03) : 401 - 418
  • [6] Computer-aided feedback on the pronunciation of Mandarin Chinese tones: using Praat to promote multimedia foreign language learning
    Chen, Mengtian
    COMPUTER ASSISTED LANGUAGE LEARNING, 2024, 37 (03) : 363 - 388
  • [7] Computer Aided Pronunciation Learning System Using Speech Recognition Techniques
    Abdou, Sherif Mahdy
    Hamid, Salah Eldeen
    Rashwan, Mohsen
    Samir, Abdurrahman
    Abd-Elhamid, Ossama
    Shahin, Mostafa
    Nazih, Waleed
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 849 - +
  • [8] Deep learning approach to automated segmentation of tongue in camera images for computer-aided speech diagnosis
    Sage, Agata
    Miodońska, Zuzanna
    Kręcichwost, Michal
    Trzaskalik, Joanna
    Kwaśniok, Ewa
    Badura, Pawel
    Advances in Intelligent Systems and Computing, 2021, 1186 : 41 - 51
  • [9] Computer-aided approach
    Graduate School of Frontier Sciences, University of Tokyo, Japan
    不详
    Mar Technol, 3 (11-14):
  • [10] Computer-Aided Quality Assurance of an Icelandic Pronunciation Dictionary
    Jansche, Martin
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2111 - 2114