LOW BIT-RATE SPEECH CODING WITH VQ-VAE AND A WAVENET DECODER

被引：0

作者：

Garbacea, Cristina ^{[1
,2
]}

van den Oord, Aaron ^{[2
]}

Li, Yazhe ^{[2
]}

Lim, Felicia S. C. ^{[3
]}

Luebs, Alejandro ^{[3
]}

Vinyals, Oriol ^{[2
]}

Walters, Thomas C. ^{[2
]}

机构：

[1] Univ Michigan, Ann Arbor, MI 48109 USA

[2] DeepMind, London, England

[3] Google, San Francisco, CA USA

来源：

2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2019年

关键词：

Speech coding; low bit-rate; generative models; WaveNet; VQ-VAE;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In order to efficiently transmit and store speech signals, speech codecs create a minimally redundant representation of the input signal which is then decoded at the receiver with the best possible perceptual quality. In this work we demonstrate that a neural network architecture based on VQ-VAE with a WaveNet decoder can be used to perform very low bit-rate speech coding with high reconstruction quality. A prosody-transparent and speaker-independent model trained on the LibriSpeech corpus coding audio at 1.6 kbps exhibits perceptual quality which is around halfway between the MELP codec at 2.4 kbps and AMR-WB codec at 23.05 kbps. In addition, when training on high-quality recorded speech with the test speaker included in the training set, a model coding speech at 1.6 kbps produces output of similar perceptual quality to that generated by AMR-WB at 23.05 kbps.

引用

页码：735 / 739

页数：5

共 50 条

[31] VQ-VAE Empowered Wireless Communication for Joint Source-Channel Coding and Beyond
Nemati, Mahyar
Park, Jihong
Choi, Jinho
IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 3155 - 3160
[32] An unified unit-selection framework for ultra low bit-rate speech coding
Ramasubramanian, V.
Harish, D.
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 217 - 220
[33] An unified unit-selection framework for ultra low bit-rate speech coding
Ramasubramanian, V.
Harish, D.
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 213 - 216
[34] Low bit-rate speech coding with predictive multi-level vector quantization
Yu, Xingye
Li, Ye
Zhang, Peng
Lin, Lingxia
Cai, Tianyu
APPLIED ACOUSTICS, 2025, 231
[35] Linear inter-frame dependencies for very low bit-rate speech coding
López-Soler, JM
Sánchez, V
de la Torre, A
Rubio-Ayuso, AJ
SPEECH COMMUNICATION, 2001, 34 (04) : 333 - 349
[36] Variable bit-rate CELP coding of speech with phonetic classification
Paksoy, Erdal
Srinivasan, Krishnaswamy
Gersho, Allen
European transactions on telecommunications and related technologies, 1994, 5 (05): : 591 - 601
[37] All-pass excitation phase modelling for low bit-rate speech coding
Cheetham, BMG
Choi, HB
Sun, HQ
Goodyear, CC
Plante, F
Wong, WTK
ISCAS '97 - PROCEEDINGS OF 1997 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS I - IV: CIRCUITS AND SYSTEMS IN THE INFORMATION AGE, 1997, : 2633 - 2636
[38] An optimal unit-selection algorithm for ultra low bit-rate speech coding
Ramasubramanian, V.
Harish, D.
2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 541 - +
[39] Speech classification embedded in adaptive codebook search for low bit-rate CELP coding
Natl Tsing Hua Univ, Hsinchu, Taiwan
IEEE Trans Speech Audio Process, 1 (94-98):
[40] Transformer VQ-VAE for Unsupervised Unit Discovery and Speech Synthesis: ZeroSpeech 2020 Challenge
Tjandra, Andros
Sakti, Sakriani
Nakamura, Satoshi
INTERSPEECH 2020, 2020, : 4851 - 4855

← 1 2 3 4 5 →