Analysis and Synthesis of Speech Using an Adaptive Full-Band Harmonic Model

被引:35
|
作者
Degottex, Gilles [1 ,2 ]
Stylianou, Yannis [1 ,2 ]
机构
[1] Univ Crete, Dept Comp Sci, GR-71003 Iraklion, Greece
[2] FORTH, Inst Comp Sci, GR-70013 Iraklion, Greece
基金
瑞士国家科学基金会;
关键词
Voice model; sinusoidal model; harmonic model; non-stationary;
D O I
10.1109/TASL.2013.2266772
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Voice models often use frequency limits to split the speech spectrum into two or more voiced/unvoiced frequency bands. However, from the voice production, the amplitude spectrum of the voiced source decreases smoothly without any abrupt frequency limit. Accordingly, multiband models struggle to estimate these limits and, as a consequence, artifacts can degrade the perceived quality. Using a linear frequency basis adapted to the non-stationarities of the speech signal, the Fan Chirp Transformation (FChT) have demonstrated harmonicity at frequencies higher than usually observed from the DFT which motivates a full-band modeling. The previously proposed Adaptive Quasi-Harmonic model (aQHM) offers even more flexibility than the FChT by using a non-linear frequency basis. In the current paper, exploiting the properties of aQHM, we describe a full-band Adaptive Harmonic Model (aHM) along with detailed descriptions of its corresponding algorithms for the estimation of harmonics up to the Nyquist frequency. Formal listening tests show that the speech reconstructed using aHM is nearly indistinguishable from the original speech. Experiments with synthetic signals also show that the proposed aHM globally outperforms previous sinusoidal and harmonic models in terms of precision in estimating the sinusoidal parameters. As a perspective, such a precision is interesting for building higher level models upon the sinusoidal parameters, like spectral envelopes for speech synthesis.
引用
收藏
页码:2085 / 2095
页数:11
相关论文
共 50 条
  • [1] A Full-Band Adaptive Harmonic Representation of Speech
    Degottex, Gilles
    Stylianou, Yannis
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 382 - 385
  • [2] ROBUST FULL-BAND ADAPTIVE SINUSOIDAL ANALYSIS AND SYNTHESIS OF SPEECH
    Kafentzis, George P.
    Rosec, Olivier
    Stylianou, Yannis
    2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2014,
  • [3] Full-Band Quasi-Harmonic Analysis and Synthesis of Musical Instrument Sounds with Adaptive Sinusoids
    Caetano, Marcelo
    Kafentzis, George P.
    Mouchtaris, Athanasios
    Stylianou, Yannis
    APPLIED SCIENCES-BASEL, 2016, 6 (05):
  • [4] TIME-SCALE MODIFICATIONS BASED ON A FULL-BAND ADAPTIVE HARMONIC MODEL
    Kafentzis, George P.
    Degottex, Gilles
    Rosec, Olivier
    Stylianou, Yannis
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 8193 - 8197
  • [5] Harmonic enhancement using learnable comb filter for light-weight full-band speech enhancement model
    Le, Xiaohuai
    Lei, Tong
    Chen, Li
    Guo, Yiqing
    He, Chao
    Chen, Cheng
    Xia, Xianjun
    Gao, Hua
    Xiao, Yijian
    Ding, Piao
    Song, Shenyi
    Lu, Jing
    INTERSPEECH 2023, 2023, : 3894 - 3898
  • [6] DNN-Based Full-Band Speech Synthesis Using GMM Approximation of Spectral Envelope
    Koguchi, Junya
    Takamichi, Shinnosuke
    Morise, Masanori
    Saruwatari, Hiroshi
    Sagayama, Shigeki
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2020, E103D (12) : 2673 - 2681
  • [7] ADAPTIVE-FSN: INTEGRATING FULL-BAND EXTRACTION AND ADAPTIVE SUB-BAND ENCODING FOR MONAURAL SPEECH ENHANCEMENT
    Tsao, Yu-Sheng
    Ho, Kuan-Hsun
    Hung, Jeih-Weih
    Chen, Berlin
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 458 - 464
  • [8] Local spectral attention for full-band speech enhancement
    Hou, Zhongshu
    Hu, Qinwen
    Chen, Kai
    Cao, Zhanzhong
    Lu, Jing
    JASA EXPRESS LETTERS, 2023, 3 (11):
  • [9] Speech Analysis and Synthesis with a Computationally Efficient Adaptive Harmonic Model
    Morfi, Veronica
    Degottex, Gilles
    Mouchtaris, Athanasios
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (11) : 1950 - 1962
  • [10] Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Full-Band Speech Enhancement
    Yu, Guochen
    Li, Andong
    Liu, Wenzhe
    Zheng, Chengshi
    Wang, Yutian
    Wang, Hui
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 483 - 487