Improving the accuracy of the speech synthesis based phonetic alignment using multiple acoustic features

被引：0

作者：

Paulo, S ^{[1
]}

Oliveira, LC ^{[1
]}

机构：

[1] IST, INESC ID, Spoken Language Syst Lab, P-1000029 Lisbon, Portugal

来源：

COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANAGUAGE, PROCEEDINGS | 2003年 / 2721卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The phonetic alignment of the spoken utterances for speech research are commonly performed by HMM-based speech recognizers, in forced alignment mode, but the training of the phonetic segment models requires considerable amounts of annotated data. When no such material is available, a possible solution is to synthesize the same phonetic sequence and align the resulting speech signal with the spoken utterances. However, without a careful choice of acoustic features used in this procedure, it can perform poorly when applied to continuous speech utterances. In this paper we propose a new method to select the best features to use in the alignment procedure for each pair of phonetic segment classes. The results show that this selection considerably reduces the segment boundary location errors.

引用

页码：31 / 39

页数：9

共 50 条

[21] ACOUSTIC-PHONETIC FEATURES OF STRESSED SYLLABLES IN SPEECH OF 3 YEAR OLDS
HAWKINS, S
ALLEN, G
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 63 : S56 - S56
[22] Phonetic Speech Segmentation of Audiobooks by Using Adapted LSTM-Based Acoustic Models
Hanzlicek, Zdenek
Matousek, Jindrich
ADVANCES IN ARTIFICIAL INTELLIGENCE-IBERAMIA 2022, 2022, 13788 : 317 - 327
[23] Incorporating finer acoustic phonetic features in lexicon for Hindi language speech recognition
Patil, Atul
More, Prashant
Sasikumar, M.
JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2019, 40 (08): : 1731 - 1739
[24] Accuracy of HMM-Based Phonetic Segmentation Using Monophone or Triphone Acoustic Model
Mizera, Petr
Pollak, Petr
2013 INTERNATIONAL CONFERENCE ON APPLIED ELECTRONICS (AE), 2013, : 181 - 184
[25] STUDY OF ACOUSTIC FEATURES OF WORD JUNCTURE USING SPEECH ANALYSIS AND SYNTHESIS
NAKATANI, LH
DUKES, KD
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1976, 59 : S4 - S5
[26] Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Neekhara, Paarth
Hussain, Shehzeen
Ghosh, Subhankar
Li, Jason
Ginsburg, Boris
INTERSPEECH 2024, 2024, : 3425 - 3429
[27] An Acoustic-Phonetic-Based Speaker Adaptation Technique for Improving Speaker-Independent Continuous Speech Recognition
Zhao, Yunxin
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1994, 2 (03): : 380 - 394
[28] An analysis of general acoustic-phonetic features for Spanish speech produced with the Lombard effect
Castellanos, A
Benedi, JM
Casacuberta, F
SPEECH COMMUNICATION, 1996, 20 (1-2) : 23 - 35
[29] Auditory processing-based features for improving speech recognition in adverse acoustic conditions
Maganti, Hari Krishna
Matassoni, Marco
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2014,
[30] Auditory processing-based features for improving speech recognition in adverse acoustic conditions
Hari Krishna Maganti
Marco Matassoni
EURASIP Journal on Audio, Speech, and Music Processing, 2014

← 1 2 3 4 5 →