LOW BIT-RATE SPEECH CODING WITH VQ-VAE AND A WAVENET DECODER

被引:0
|
作者
Garbacea, Cristina [1 ,2 ]
van den Oord, Aaron [2 ]
Li, Yazhe [2 ]
Lim, Felicia S. C. [3 ]
Luebs, Alejandro [3 ]
Vinyals, Oriol [2 ]
Walters, Thomas C. [2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] DeepMind, London, England
[3] Google, San Francisco, CA USA
来源
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年
关键词
Speech coding; low bit-rate; generative models; WaveNet; VQ-VAE;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demonstrate that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality. A prosody-transparent and speaker-independent model trained on the LibriSpeech corpus coding audio at 1.6 kbps exhibits perceptual quality which is around halfway between the MELP codec at 2.4 kbps and AMR-WB codec at 23.05 kbps. In addition, when training on high-quality recorded speech with the test speaker included in the training set, a model coding speech at 1.6 kbps produces output of similar perceptual quality to that generated by AMR-WB at 23.05 kbps.
引用
收藏
页码:735 / 739
页数:5
相关论文
共 50 条
  • [31] VQ-VAE Empowered Wireless Communication for Joint Source-Channel Coding and Beyond
    Nemati, Mahyar
    Park, Jihong
    Choi, Jinho
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 3155 - 3160
  • [32] An unified unit-selection framework for ultra low bit-rate speech coding
    Ramasubramanian, V.
    Harish, D.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 217 - 220
  • [33] An unified unit-selection framework for ultra low bit-rate speech coding
    Ramasubramanian, V.
    Harish, D.
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 213 - 216
  • [34] Low bit-rate speech coding with predictive multi-level vector quantization
    Yu, Xingye
    Li, Ye
    Zhang, Peng
    Lin, Lingxia
    Cai, Tianyu
    APPLIED ACOUSTICS, 2025, 231
  • [35] Linear inter-frame dependencies for very low bit-rate speech coding
    López-Soler, JM
    Sánchez, V
    de la Torre, A
    Rubio-Ayuso, AJ
    SPEECH COMMUNICATION, 2001, 34 (04) : 333 - 349
  • [36] Variable bit-rate CELP coding of speech with phonetic classification
    Paksoy, Erdal
    Srinivasan, Krishnaswamy
    Gersho, Allen
    European transactions on telecommunications and related technologies, 1994, 5 (05): : 591 - 601
  • [37] All-pass excitation phase modelling for low bit-rate speech coding
    Cheetham, BMG
    Choi, HB
    Sun, HQ
    Goodyear, CC
    Plante, F
    Wong, WTK
    ISCAS '97 - PROCEEDINGS OF 1997 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I - IV: CIRCUITS AND SYSTEMS IN THE INFORMATION AGE, 1997, : 2633 - 2636
  • [38] An optimal unit-selection algorithm for ultra low bit-rate speech coding
    Ramasubramanian, V.
    Harish, D.
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 541 - +
  • [39] Speech classification embedded in adaptive codebook search for low bit-rate CELP coding
    Natl Tsing Hua Univ, Hsinchu, Taiwan
    IEEE Trans Speech Audio Process, 1 (94-98):
  • [40] Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge
    Tjandra, Andros
    Sakti, Sakriani
    Nakamura, Satoshi
    INTERSPEECH 2020, 2020, : 4851 - 4855