Phonological feature-based speech recognition system for pronunciation training in non-native language learning

被引:18
|
作者
Arora, Vipul [1 ]
Lahiri, Aditi [1 ]
Reetz, Henning [2 ]
机构
[1] Univ Oxford, Fac Linguist Philol & Phonet, Oxford, England
[2] Goethe Univ, Frankfurt, Germany
来源
基金
欧洲研究理事会;
关键词
MISPRONUNCIATION DETECTION; ACOUSTIC INVARIANCE; STOP CONSONANTS; VISUAL FEEDBACK; ARTICULATION; DIAGNOSIS; FRAMEWORK; MODELS; PLACE;
D O I
10.1121/1.5017834
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The authors address the question whether phonological features can be used effectively in an automatic speech recognition (ASR) system for pronunciation training in non-native language (L2) learning. Computer-aided pronunciation training consists of two essential tasks-detecting mispronunciations and providing corrective feedback, usually either on the basis of full words or phonemes. Phonemes, however, can be further disassembled into phonological features, which in turn define groups of phonemes. A phonological feature-based ASR system allows the authors to perform a sub-phonemic analysis at feature level, providing a more effective feedback to reach the acoustic goal and perceptual constancy. Furthermore, phonological features provide a structured way for analysing the types of errors a learner makes, and can readily convey which pronunciations need improvement. This paper presents the authors implementation of such an ASR system using deep neural networks as an acoustic model, and its use for detecting mispronunciations, analysing errors, and rendering corrective feedback. Quantitative as well as qualitative evaluations are carried out for German and Italian learners of English. In addition to achieving high accuracy of mispronunciation detection, the system also provides accurate diagnosis of errors. (C) 2018 Acoustical Society of America.
引用
收藏
页码:98 / 108
页数:11
相关论文
共 50 条
  • [31] Non-native speech recognition sentences: A new materials set for non-native speech perception research
    Louise Stringer
    Paul Iverson
    Behavior Research Methods, 2020, 52 : 561 - 571
  • [32] An Acoustic Feature-Based Deep Learning Model for Automatic Thai Vowel Pronunciation Recognition
    Rukwong, Niyada
    Pongpinigpinyo, Sunee
    APPLIED SCIENCES-BASEL, 2022, 12 (13):
  • [33] Articulatory Modeling for Pronunciation Error Detection without Non-Native Training Data Based on DNN Transfer Learning
    Duan, Richeng
    Kawahara, Tatsuya
    Dantsuji, Masatake
    Zhang, Jinsong
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2017, E100D (09): : 2174 - 2182
  • [34] Non-native phonetic learning is destabilized by exposure to phonological variability before and after training
    Fuhrmeister, Pamela
    Myers, Emily B.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2017, 142 (05): : EL448 - EL454
  • [35] Perception of audiovisual speech synchrony for native and non-native language
    Navarra, Jordi
    Alsius, Agnes
    Velasco, Ignacio
    Soto-Faraco, Salvador
    Spence, Charles
    BRAIN RESEARCH, 2010, 1323 : 84 - 93
  • [36] ACOUSTIC MODELING FOR NATIVE AND NON-NATIVE MANDARIN SPEECH RECOGNITION
    Chen, Xin
    Cheng, Jian
    2012 8TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING, 2012, : 325 - 329
  • [37] Lexical modeling of non-native speech for automatic speech recognition
    Livescu, K
    Glass, J
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1683 - 1686
  • [38] SENSITIVE PERIOD FOR ACQUISITION OF A NON-NATIVE PHONOLOGICAL SYSTEM
    OYAMA, S
    JOURNAL OF PSYCHOLINGUISTIC RESEARCH, 1976, 5 (03) : 261 - 283
  • [39] Design and Development of Multimedia Pronunciation Learning Management System for Non-Native English Speakers
    Por, Fei Ping
    Mustafa, Zarina
    Osman, Shuki
    Phoon, Hooi San
    Fong, Soon Fook
    12TH INTERNATIONAL EDUCATIONAL TECHNOLOGY CONFERENCE - IETC 2012, 2012, 64 : 584 - 593
  • [40] A probabilistic framework for feature-based speech recognition
    Glass, J
    Chang, J
    McCandless, M
    ICSLP 96 - FOURTH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, VOLS 1-4, 1996, : 2277 - 2280