Analysis and Synthesis of Speech Using an Adaptive Full-Band Harmonic Model

被引:35
|
作者
Degottex, Gilles [1 ,2 ]
Stylianou, Yannis [1 ,2 ]
机构
[1] Univ Crete, Dept Comp Sci, GR-71003 Iraklion, Greece
[2] FORTH, Inst Comp Sci, GR-70013 Iraklion, Greece
基金
瑞士国家科学基金会;
关键词
Voice model; sinusoidal model; harmonic model; non-stationary;
D O I
10.1109/TASL.2013.2266772
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice models often use frequency limits to split the speech spectrum into two or more voiced/unvoiced frequency bands. However, from the voice production, the amplitude spectrum of the voiced source decreases smoothly without any abrupt frequency limit. Accordingly, multiband models struggle to estimate these limits and, as a consequence, artifacts can degrade the perceived quality. Using a linear frequency basis adapted to the non-stationarities of the speech signal, the Fan Chirp Transformation (FChT) have demonstrated harmonicity at frequencies higher than usually observed from the DFT which motivates a full-band modeling. The previously proposed Adaptive Quasi-Harmonic model (aQHM) offers even more flexibility than the FChT by using a non-linear frequency basis. In the current paper, exploiting the properties of aQHM, we describe a full-band Adaptive Harmonic Model (aHM) along with detailed descriptions of its corresponding algorithms for the estimation of harmonics up to the Nyquist frequency. Formal listening tests show that the speech reconstructed using aHM is nearly indistinguishable from the original speech. Experiments with synthetic signals also show that the proposed aHM globally outperforms previous sinusoidal and harmonic models in terms of precision in estimating the sinusoidal parameters. As a perspective, such a precision is interesting for building higher level models upon the sinusoidal parameters, like spectral envelopes for speech synthesis.
引用
收藏
页码:2085 / 2095
页数:11
相关论文
共 50 条
  • [21] Objective Quality Assessment of Echo-Impaired Full-Band Speech Signals
    Avila, Flavio R.
    Nunes, Leonardo O.
    Biscainho, Luiz W. P.
    Tygel, Alan F.
    Lee, Bowon
    2014 INTERNATIONAL TELECOMMUNICATIONS SYMPOSIUM (ITS), 2014,
  • [22] ANALYSIS/SYNTHESIS OF SPEECH BASED ON AN ADAPTIVE QUASI-HARMONIC PLUS NOISE MODEL
    Pantazis, Yannis
    Tzedakis, Georgios
    Rosec, Olivier
    Stylianou, Yannis
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4246 - 4249
  • [23] FULLSUBNET: A FULL-BAND AND SUB-BAND FUSION MODEL FOR REAL-TIME SINGLE-CHANNEL SPEECH ENHANCEMENT
    Hao, Xiang
    Su, Xiangdong
    Horaud, Radu
    Li, Xiaofei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6633 - 6637
  • [24] Full-band Monte Carlo model for hole transport in silicon
    Journal of Applied Physics, 1997, 81 (05):
  • [25] A Full Band adaptive Harmonic Model Based Speaker Identity Transformation using Radial Basis Function
    Chadha, Ankita
    Nirmal, Jagannath
    PROCEEDINGS OF 2017 11TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO 2017), 2017, : 217 - 223
  • [26] A full-band Monte Carlo model for hole transport in silicon
    Jallepalli, S
    Rashed, M
    Shih, WK
    Maziar, CM
    Tasch, AF
    JOURNAL OF APPLIED PHYSICS, 1997, 81 (05) : 2250 - 2255
  • [27] On the Use of Absolute Threshold of Hearing-based Loss for Full-band Speech Enhancement
    Mars, Rohith
    Das, Rohan Kumar
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 458 - 462
  • [28] MULTI-CHANNEL NARROW-BAND DEEP SPEECH SEPARATION WITH FULL-BAND PERMUTATION INVARIANT TRAINING
    Quan, Changsheng
    Li, Xiaofei
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 541 - 545
  • [29] FSI-Net: A dual-stage full- and sub-band integration network for full-band speech enhancement
    Yu, Guochen
    Wang, Hui
    Li, Andong
    Liu, Wenzhe
    Zhang, Yuan
    Wang, Yutian
    Zheng, Chengshi
    APPLIED ACOUSTICS, 2023, 211
  • [30] DEEPFILTERNET: A LOW COMPLEXITY SPEECH ENHANCEMENT FRAMEWORK FOR FULL-BAND AUDIO BASED ON DEEP FILTERING
    Schroeter, Hendrik
    Escalante-B, Alberto N.
    Rosenkranz, Tobias
    Maier, Andreas
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7407 - 7411