Improving the accuracy of the speech synthesis based phonetic alignment using multiple acoustic features

被引:0
|
作者
Paulo, S [1 ]
Oliveira, LC [1 ]
机构
[1] IST, INESC ID, Spoken Language Syst Lab, P-1000029 Lisbon, Portugal
来源
COMPUTATIONAL PROCESSING OF THE PORTUGUESE LANAGUAGE, PROCEEDINGS | 2003年 / 2721卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The phonetic alignment of the spoken utterances for speech research are commonly performed by HMM-based speech recognizers, in forced alignment mode, but the training of the phonetic segment models requires considerable amounts of annotated data. When no such material is available, a possible solution is to synthesize the same phonetic sequence and align the resulting speech signal with the spoken utterances. However, without a careful choice of acoustic features used in this procedure, it can perform poorly when applied to continuous speech utterances. In this paper we propose a new method to select the best features to use in the alignment procedure for each pair of phonetic segment classes. The results show that this selection considerably reduces the segment boundary location errors.
引用
收藏
页码:31 / 39
页数:9
相关论文
共 50 条
  • [31] ACOUSTIC-PHONETIC FEATURE BASED DIALECT IDENTIFICATION IN HINDI SPEECH
    Sinha, Shweta
    Jain, Aruna
    Agrawal, S. S.
    INTERNATIONAL JOURNAL ON SMART SENSING AND INTELLIGENT SYSTEMS, 2015, 8 (01) : 235 - 254
  • [32] Utterance Verification-Based Dysarthric Speech Intelligibility Assessment Using Phonetic Posterior Features
    Fritsch, Julian
    Magimai-Doss, Mathew
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 224 - 228
  • [33] Using Forced Alignment for Automatic Acoustic-phonetic Segmentation of Aphasic Discourse
    Lee, A.
    Kong, A.
    Law, S.
    50TH ACADEMY OF APHASIA MEETING, 2012, 61 : 92 - +
  • [34] Recognizing Speech Emotion Based on Acoustic Features Using Machine Learning
    Nasim, Md Abu Saleh
    Chowdory, Md Rakibul Hassan
    Dey, Ashim
    Das, Annesha
    13TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS 2021), 2021, : 95 - +
  • [35] Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech
    Diwakar, G.
    Karjigi, Veena
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (11) : 5543 - 5567
  • [36] Improving Speech to Text Alignment Based on Repetition Detection for Dysarthric Speech
    G. Diwakar
    Veena Karjigi
    Circuits, Systems, and Signal Processing, 2020, 39 : 5543 - 5567
  • [37] Entropy-Based Sentence Selection for Speech Synthesis Using Phonetic and Prosodic Contexts
    Nose, Takashi
    Arao, Yusuke
    Kobayashi, Takao
    Sugiura, Komei
    Shiga, Yoshinori
    Ito, Akinori
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 3491 - 3495
  • [38] Towards capturing fine phonetic variation in speech using articulatory features
    Scharenborg, Odette
    Wan, Vincent
    Moore, Roger K.
    SPEECH COMMUNICATION, 2007, 49 (10-11) : 811 - 826
  • [39] Model-based Articulatory Phonetic Features for Improved Speech Recognition
    Huang, Guangpu
    Er, Meng Joo
    2012 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2012,
  • [40] Improving Text-Independent Forced Alignment to Support Speech-Language Pathologists with Phonetic Transcription
    Li, Ying
    Wohlan, Bryce Johannas
    Pham, Duc-Son
    Chan, Kit Yan
    Ward, Roslyn
    Hennessey, Neville
    Tan, Tele
    SENSORS, 2023, 23 (24)