LOW BIT-RATE SPEECH CODING WITH VQ-VAE AND A WAVENET DECODER

被引:0
|
作者
Garbacea, Cristina [1 ,2 ]
van den Oord, Aaron [2 ]
Li, Yazhe [2 ]
Lim, Felicia S. C. [3 ]
Luebs, Alejandro [3 ]
Vinyals, Oriol [2 ]
Walters, Thomas C. [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] DeepMind, London, England
[3] Google, San Francisco, CA USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
Speech coding; low bit-rate; generative models; WaveNet; VQ-VAE;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demonstrate that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality. A prosody-transparent and speaker-independent model trained on the LibriSpeech corpus coding audio at 1.6 kbps exhibits perceptual quality which is around halfway between the MELP codec at 2.4 kbps and AMR-WB codec at 23.05 kbps. In addition, when training on high-quality recorded speech with the test speaker included in the training set, a model coding speech at 1.6 kbps produces output of similar perceptual quality to that generated by AMR-WB at 23.05 kbps.
引用
收藏
页码:735 / 739
页数:5
相关论文
共 50 条
  • [21] Low bit-rate speech coding based on multicomponent AFM signal model
    Bansal M.
    Sircar P.
    International Journal of Speech Technology, 2018, 21 (4) : 783 - 795
  • [22] Steganography integrated into linear predictive coding for low bit-rate speech codec
    Peng Liu
    Songbin Li
    Haiqiang Wang
    Multimedia Tools and Applications, 2017, 76 : 2837 - 2859
  • [23] 3.35kb/s low bit-rate speech coding algorithm
    Li, Yue
    Tang, Kun
    Cui, Huijuan
    Du, Wen
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2004, 44 (10): : 1410 - 1413
  • [24] Algorithms for Low Bit-Rate Coding with Adaptation to Statistical Characteristics of Speech Signal
    Saveliev, Anton
    Basov, Oleg
    Ronzhin, Andrey
    Ronzhin, Alexander
    SPEECH AND COMPUTER (SPECOM 2015), 2015, 9319 : 65 - 72
  • [25] Phase modelling of speech excitation for low bit-rate sinusoidal transform coding
    Sun, XQ
    Plante, F
    Cheetham, BMG
    Wong, KWT
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 1691 - 1694
  • [26] Steganography integrated into linear predictive coding for low bit-rate speech codec
    Liu, Peng
    Li, Songbin
    wang, Haiqiang
    MULTIMEDIA TOOLS AND APPLICATIONS, 2017, 76 (02) : 2837 - 2859
  • [27] Very low bit-rate video coding
    Kocharoen, P
    Ahmed, KM
    TENCON 2004 - 2004 IEEE REGION 10 CONFERENCE, VOLS A-D, PROCEEDINGS: ANALOG AND DIGITAL TECHNIQUES IN ELECTRICAL ENGINEERING, 2004, : A610 - A613
  • [28] Low bit-rate speed coding technology
    Tasaki, Hirohisa
    Takahashi, Shin'ya
    Mitsubishi Electric Advance, 1998, 84 : 17 - 19
  • [29] Low bit-rate speed coding technology
    Tasaki, H
    Takahashi, S
    MITSUBISHI ELECTRIC ADVANCE, 1998, 84 : 17 - 19
  • [30] LOW BIT-RATE CODING OF MOVING IMAGES
    HASKELL, BG
    PEARSON, D
    YAMAMOTO, H
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 1987, 5 (07) : 1065 - 1067