Speech vocoding for laboratory phonology

被引:3
|
作者
Cernak, Milos [1 ]
Benus, Stefan [2 ,3 ]
Lazaridis, Alexandros [1 ]
机构
[1] Idiap Res Inst, Martigny, Switzerland
[2] Univ Nitra, Nitra, Slovakia
[3] Slovak Acad Sci, Inst Informat, Bratislava, Slovakia
来源
COMPUTER SPEECH AND LANGUAGE | 2017年 / 42卷
基金
瑞士国家科学基金会;
关键词
Phonological speech representation; Parametric speech synthesis; Laboratory phonology; LANGUAGE;
D O I
10.1016/j.csl.2016.10.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Using phonological speech vocoding, we propose a platform for exploring relations between phonology and speech processing, and in broader terms, for exploring relations between the abstract and physical structures of a speech signal. Our goal is to make a step towards bridging phonology and speech processing and to contribute to the program of Laboratory Phonology. We show three application examples for laboratory phonology: compositional phonological speech modelling, a comparison of phonological systems and an experimental phonological parametric text-to-speech (TTS) system. The featural representations of the following three phonological systems are considered in this work: (i) Government Phonology (GP), (ii) the Sound Pattern of English (SPE), and (iii) the extended SPE (eSPE). Comparing GP- and eSPE-based vocoded speech, we conclude that the latter achieves slightly better results than the former. However, GP - the most compact phonological speech representation - performs comparably to the systems with a higher number of phonological features. The parametric TTS based on phonological speech representation, and trained from an unlabelled audiobook in an unsupervised manner, achieves intelligibility of 85% of the state-of-the-art parametric speech synthesis. We envision that the presented approach paves the way for researchers in both fields to form meaningful hypotheses that are explicitly testable using the concepts developed and exemplified in this paper. On the one hand, laboratory phonologists might test the applied concepts of their theoretical models, and on the other hand, the speech processing community may utilize the concepts developed for the theoretical phonological models for improvements of the current state-of-the-art applications. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:100 / 121
页数:22
相关论文
共 50 条
  • [1] Effects of vocoding and intelligibility on the cerebral response to speech
    Strelnikov, Kuzma
    Massida, Zoe
    Rouger, Julien
    Belin, Pascal
    Barone, Pascal
    BMC NEUROSCIENCE, 2011, 12
  • [2] Effects of vocoding and intelligibility on the cerebral response to speech
    Kuzma Strelnikov
    Zoé Massida
    Julien Rouger
    Pascal Belin
    Pascal Barone
    BMC Neuroscience, 12
  • [3] The effects of noise vocoding on speech quality perception
    Anderson, Melinda C.
    Arehart, Kathryn H.
    Kates, James M.
    HEARING RESEARCH, 2014, 309 : 75 - 83
  • [4] DEEPA: A DEEP NEURAL ANALYZER FOR SPEECH AND SINGING VOCODING
    Nikonorov, Sergey
    Sisman, Berrak
    Zhang, Mingyang
    Li, Haizhou
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 618 - 625
  • [5] The Laboratory Phonology: At the interface of phonology and phonetics
    Wallet, Lucille
    LANGAGES, 2015, (198) : 133 - +
  • [6] Analysis by Adversarial Synthesis - A Novel Approach for Speech Vocoding
    Mustafa, Ahmed
    Biswas, Arijit
    Bergler, Christian
    Schottenhamml, Julia
    Maier, Andreas
    INTERSPEECH 2019, 2019, : 191 - 195
  • [7] THE PHONOLOGY OF THE PAUSE OF SPEECH
    WEINRICH, H
    PHONETICA, 1961, 7 (01) : 4 - 18
  • [8] Laboratory phonology 8
    Padgett, Jaye
    LANGUAGE, 2010, 86 (04) : 957 - 960
  • [9] Laboratory phonology 10
    Coleman, John
    PHONOLOGY, 2012, 29 (02) : 331 - 336
  • [10] Using casual speech phonology in synthetic speech
    Shockey, Linda
    ARCHIVES OF ACOUSTICS, 2007, 32 (01) : 101 - 109