LOW BIT-RATE SPEECH CODING WITH VQ-VAE AND A WAVENET DECODER

被引:0
|
作者
Garbacea, Cristina [1 ,2 ]
van den Oord, Aaron [2 ]
Li, Yazhe [2 ]
Lim, Felicia S. C. [3 ]
Luebs, Alejandro [3 ]
Vinyals, Oriol [2 ]
Walters, Thomas C. [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] DeepMind, London, England
[3] Google, San Francisco, CA USA
关键词
Speech coding; low bit-rate; generative models; WaveNet; VQ-VAE;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demonstrate that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality. A prosody-transparent and speaker-independent model trained on the LibriSpeech corpus coding audio at 1.6 kbps exhibits perceptual quality which is around halfway between the MELP codec at 2.4 kbps and AMR-WB codec at 23.05 kbps. In addition, when training on high-quality recorded speech with the test speaker included in the training set, a model coding speech at 1.6 kbps produces output of similar perceptual quality to that generated by AMR-WB at 23.05 kbps.
引用
收藏
页码:735 / 739
页数:5
相关论文
共 50 条
  • [1] Pitch quantization in low bit-rate speech coding
    Eriksson, T
    Kang, HG
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 489 - 492
  • [2] SIGNAL MODELS FOR LOW BIT-RATE CODING OF SPEECH
    FLANAGAN, JL
    ISHIZAKA, K
    SHIPLEY, KL
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1980, 68 (03): : 780 - 791
  • [3] Techniques of very low bit-rate speech coding
    Cui, HJ
    Tang, K
    Zhao, M
    Zhang, X
    CHINESE JOURNAL OF ELECTRONICS, 2004, 13 (01): : 63 - 65
  • [4] SPEECH RECONSTRUCTION FOR MFCC-BASED LOW BIT-RATE SPEECH CODING
    Jiang Wenbin
    Ying Rendong
    Liu Peilin
    2014 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS (ICMEW), 2014,
  • [5] Study on speech coding with half bit-rate
    Chen, Yuyuan
    Chen, Yongsheng
    Cui, Ying
    Zheng, Zhijun
    Tiedao Xuebao/Journal of the China Railway Society, 20 (02): : 71 - 74
  • [6] WAVENET BASED LOW RATE SPEECH CODING
    Kleijn, W. Bastiaan
    Lim, Felicia S. C.
    Luebs, Alejandro
    Skoglund, Jan
    Stimberg, Florian
    Wang, Quan
    Walters, Thomas C.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 676 - 680
  • [7] Bandwidth extension of narrowband speech for low bit-rate wideband coding
    Valin, JM
    Lefebvre, R
    2000 IEEE WORKSHOP ON SPEECH CODING, PROCEEDINGS: MEETING THE CHALLENGES OF THE NEW MILLENNIUM, 2000, : 130 - 132
  • [8] Low bit-rate speech coding based on an improved sinusoidal model
    Ahmadi, S
    Spanias, AS
    SPEECH COMMUNICATION, 2001, 34 (04) : 369 - 390
  • [9] ADAPTIVE DENSITY PULSE EXCITATION FOR LOW BIT-RATE SPEECH CODING
    AKAMINE, M
    MISEKI, K
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1995, E78A (02) : 199 - 207
  • [10] Improving low bit-rate coding
    Rumsey, Francis
    AES: Journal of the Audio Engineering Society, 2010, 58 (12): : 1116 - 1121