Context-dependent Duration Modeling with Backoff Strategy and Look-up Tables for Pronunciation Assessment and Mispronunciation Detection
被引:0
|
作者:
Li, Hongyan
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Inst Automat, Digital Content Technol Res Ctr, Beijing 100190, Peoples R ChinaChinese Acad Sci, Inst Automat, Digital Content Technol Res Ctr, Beijing 100190, Peoples R China
Li, Hongyan
[1
]
Huang, Shen
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Inst Automat, Digital Content Technol Res Ctr, Beijing 100190, Peoples R ChinaChinese Acad Sci, Inst Automat, Digital Content Technol Res Ctr, Beijing 100190, Peoples R China
Huang, Shen
[1
]
Wang, Shijin
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Inst Automat, Digital Content Technol Res Ctr, Beijing 100190, Peoples R ChinaChinese Acad Sci, Inst Automat, Digital Content Technol Res Ctr, Beijing 100190, Peoples R China
Wang, Shijin
[1
]
Xu, Bo
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Inst Automat, Digital Content Technol Res Ctr, Beijing 100190, Peoples R ChinaChinese Acad Sci, Inst Automat, Digital Content Technol Res Ctr, Beijing 100190, Peoples R China
Xu, Bo
[1
]
机构:
[1] Chinese Acad Sci, Inst Automat, Digital Content Technol Res Ctr, Beijing 100190, Peoples R China
This paper makes an intensive study on the contextual modeling methods of duration information, for the purpose of improving the performance of pronunciation assessment and mispronunciation detection. The main ideas include: 1) we extend the relations among duration sequence with different level of contextual constraints, and bring them into a unified framework. 2) A backoff mechanism is introduced to resolve the problem of data sparseness and unbalanced distribution. 3) Rather than the traditional parametric functions, we use the discrete modeling for empirical duration distributions based on look-up tables, which can improve the model precision and accelerate the computation speed. The experimental results show the effectiveness of the above methods. The proposed word-dependent duration models can yield 0.0782 in absolute CC (correlation coefficient) improvement and 4.58% in absolute EER (equal error rate) reduction for the tasks of pronunciation assessment and mispronunciation detection respectively, both compared with the baseline method with conventional context-independent case.