Investigating accuracy of pitch-accent annotations in neural network-based

被引:0
|
作者
Luong, Hieu-Thi [1 ]
Wang, Xin [1 ]
Yamagishi, Junichi [1 ]
Nishizawa, Nobuyuki [2 ]
机构
[1] Natl Inst Informat, Tokyo, Japan
[2] KDDI Res Inc, Saitama, Japan
关键词
speech synthesis; deep neural network; Japanese prosody; WaveNet;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We investigated the impact of noisy linguistic features on the performance of a Japanese speech synthesis system based on neural network that uses WaveNet vocoder. We compared an ideal system that uses manually corrected linguistic features including phoneme and prosodic information in training and test sets against a few other systems that use corrupted linguistic features. Both subjective and objective results demonstrate that corrupted linguistic features, especially those in the test set, affected the ideal system's performance significantly in a statistical sense due to a mismatched condition between the training and test sets. Interestingly, while an utterance-level Turing test showed that listeners had a difficult time differentiating synthetic speech from natural speech, it further indicated that adding noise to the linguistic features in the training set can partially reduce the effect of the mismatch, regularize the model, and help the system perform better when linguistic features of the test set are noisy.
引用
收藏
页码:37 / 41
页数:5
相关论文
共 50 条
  • [1] Japanese pitch-accent identification accuracy by children with autism spectrum disorder
    Shinohara, Yasuaki
    Uchida, Mariko
    Matsui, Tomoko
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):
  • [2] Intra-Native Accent Shared Features for Improving Neural Network-Based Accent Classification and Accent Similarity Evaluation
    Wubet Y.A.
    Balram D.
    Lian K.-Y.
    IEEE Access, 2023, 11 : 32176 - 32186
  • [3] ACOUSTIC-BASED PITCH-ACCENT DETECTION IN SPEECH: DEPENDENCE ON WORD IDENTITY AND INSENSITIVITY TO VARIATIONS IN WORD USAGE
    Margolis, Anna
    Ostendorf, Mari
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4513 - 4516
  • [4] A Neural Network-Based Optimal Nonlinear Fusion of Speech Pitch Detection Algorithms
    Imani, Ziba
    Kabudian, Seyed Jahanshah
    2019 IEEE 5TH CONFERENCE ON KNOWLEDGE BASED ENGINEERING AND INNOVATION (KBEI 2019), 2019, : 794 - 798
  • [5] Neural Network-Based Calibration for Accuracy Improvement in Lateration Positioning System
    Petrovic, Milica
    Wolniakowski, Adam
    Ciezkowski, Maciej
    Romaniuk, Slawomir
    Miljkovic, Zoran
    15TH INTERNATIONAL CONFERENCE MECHATRONIC SYSTEMS AND MATERIALS, MSM'20, 2020, : 137 - 142
  • [6] Neural Network-Based Accuracy Enhancement Method for WLAN Indoor Positioning
    Xu, Yubin
    Sun, Yongliang
    2012 IEEE VEHICULAR TECHNOLOGY CONFERENCE (VTC FALL), 2012,
  • [7] Bayesian phylogenetic analysis of pitch-accent systems based on accentual class merger: a new method applied to Japanese dialects
    Takahashi, Takuya
    Onohara, Ayaka
    Ihara, Yasuo
    JOURNAL OF LANGUAGE EVOLUTION, 2024, 8 (02) : 169 - 191
  • [8] Artificial Intelligence and Neural Network-Based Shooting Accuracy Prediction Analysis in Basketball
    Li, Hongfei
    Zhang, Maolin
    MOBILE INFORMATION SYSTEMS, 2021, 2021 (2021)
  • [9] Investigating Modulation Spectrogram Features for Deep Neural Network-based Automatic Speech Recognition
    Baby, Deepak
    Van Hamme, Hugo
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2479 - 2483
  • [10] Investigating Network-based Proximity in American Biotechnology
    Lee, Der-Shiuan
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS 19TH ANNUAL CONFERENCE, KES-2015, 2015, 60 : 1021 - 1031