LOW BIT-RATE SPEECH CODING WITH VQ-VAE AND A WAVENET DECODER

被引:0
|
作者
Garbacea, Cristina [1 ,2 ]
van den Oord, Aaron [2 ]
Li, Yazhe [2 ]
Lim, Felicia S. C. [3 ]
Luebs, Alejandro [3 ]
Vinyals, Oriol [2 ]
Walters, Thomas C. [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] DeepMind, London, England
[3] Google, San Francisco, CA USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
Speech coding; low bit-rate; generative models; WaveNet; VQ-VAE;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demonstrate that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality. A prosody-transparent and speaker-independent model trained on the LibriSpeech corpus coding audio at 1.6 kbps exhibits perceptual quality which is around halfway between the MELP codec at 2.4 kbps and AMR-WB codec at 23.05 kbps. In addition, when training on high-quality recorded speech with the test speaker included in the training set, a model coding speech at 1.6 kbps produces output of similar perceptual quality to that generated by AMR-WB at 23.05 kbps.
引用
收藏
页码:735 / 739
页数:5
相关论文
共 50 条
  • [41] VARIABLE BIT-RATE CELP CODING OF SPEECH WITH PHONETIC CLASSIFICATION
    PAKSOY, E
    SRINIVASAN, K
    GERSHO, A
    EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, 1994, 5 (05): : 591 - 601
  • [42] SPEECH CLASSIFICATION EMBEDDED IN ADAPTIVE CODEBOOK SEARCH FOR LOW BIT-RATE CELP CODING
    KUO, CC
    JEAN, FR
    WANG, HC
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (01): : 94 - 98
  • [43] END-TO-END TEXT-TO-SPEECH USING LATENT DURATION BASED ON VQ-VAE
    Yasuda, Yusuke
    Wang, Xin
    Yamagishi, Junichi
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5694 - 5698
  • [44] Enhanced waveform interpolative coding at low bit-rate
    Gottesman, O
    Gersho, A
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (08): : 786 - 798
  • [45] Very low bit-rate wavelet video coding
    Cinkler, K
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1998, 16 (01) : 4 - 11
  • [46] Low bit-rate image coding for facial movement
    Takaya, K
    Reinhardt, RT
    APCCAS '96 - IEEE ASIA PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS '96, 1996, : 6 - 9
  • [47] Low bit-rate image coding for facial movement
    Takaya, K
    Reinhardt, RT
    JOURNAL OF CIRCUITS SYSTEMS AND COMPUTERS, 1997, 7 (04) : 249 - 259
  • [48] VERY-LOW BIT-RATE VIDEO CODING
    TZOU, KH
    MUSMANN, HG
    AIZAWA, K
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 1994, 4 (03) : 213 - 215
  • [49] On low bit-rate coding using the contourlet transform
    Eslami, R
    Radha, H
    CONFERENCE RECORD OF THE THIRTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, VOLS 1 AND 2, 2003, : 1524 - 1528
  • [50] Very low bit-rate digital video coding
    Scargall, Lee
    Dlay, Satnam
    Advances in Intelligent Systems and Computer Science, 1999, : 273 - 279