Improving the accuracy of the speech synthesis based phonetic alignment using multiple acoustic features

被引：0

作者：

Paulo, S ^{[1
]}

Oliveira, LC ^{[1
]}

机构：

[1] IST, INESC ID, Spoken Language Syst Lab, P-1000029 Lisbon, Portugal

来源：

COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANAGUAGE, PROCEEDINGS | 2003年 / 2721卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The phonetic alignment of the spoken utterances for speech research are commonly performed by HMM-based speech recognizers, in forced alignment mode, but the training of the phonetic segment models requires considerable amounts of annotated data. When no such material is available, a possible solution is to synthesize the same phonetic sequence and align the resulting speech signal with the spoken utterances. However, without a careful choice of acoustic features used in this procedure, it can perform poorly when applied to continuous speech utterances. In this paper we propose a new method to select the best features to use in the alignment procedure for each pair of phonetic segment classes. The results show that this selection considerably reduces the segment boundary location errors.

引用

页码：31 / 39

页数：9

共 50 条

[31] ACOUSTIC-PHONETIC FEATURE BASED DIALECT IDENTIFICATION IN HINDI SPEECH
Sinha, Shweta
Jain, Aruna
Agrawal, S. S.
INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS, 2015, 8 (01) : 235 - 254
[32] Utterance Verification-Based Dysarthric Speech Intelligibility Assessment Using Phonetic Posterior Features
Fritsch, Julian
Magimai-Doss, Mathew
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 224 - 228
[33] Using Forced Alignment for Automatic Acoustic-phonetic Segmentation of Aphasic Discourse
Lee, A.
Kong, A.
Law, S.
50TH ACADEMY OF APHASIA MEETING, 2012, 61 : 92 - +
[34] Recognizing Speech Emotion Based on Acoustic Features Using Machine Learning
Nasim, Md Abu Saleh
Chowdory, Md Rakibul Hassan
Dey, Ashim
Das, Annesha
13TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS 2021), 2021, : 95 - +
[35] Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech
Diwakar, G.
Karjigi, Veena
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (11) : 5543 - 5567
[36] Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech
G. Diwakar
Veena Karjigi
Circuits, Systems, and Signal Processing, 2020, 39 : 5543 - 5567
[37] Entropy-Based Sentence Selection for Speech Synthesis Using Phonetic and Prosodic Contexts
Nose, Takashi
Arao, Yusuke
Kobayashi, Takao
Sugiura, Komei
Shiga, Yoshinori
Ito, Akinori
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3491 - 3495
[38] Towards capturing fine phonetic variation in speech using articulatory features
Scharenborg, Odette
Wan, Vincent
Moore, Roger K.
SPEECH COMMUNICATION, 2007, 49 (10-11) : 811 - 826
[39] Model-based Articulatory Phonetic Features for Improved Speech Recognition
Huang, Guangpu
Er, Meng Joo
2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
[40] Improving Text-Independent Forced Alignment to Support Speech-Language Pathologists with Phonetic Transcription
Li, Ying
Wohlan, Bryce Johannas
Pham, Duc-Son
Chan, Kit Yan
Ward, Roslyn
Hennessey, Neville
Tan, Tele
SENSORS, 2023, 23 (24)

← 1 2 3 4 5 →