Neurally Optimized Decoder for Low Bitrate Speech Codec

被引：1

作者：

Kim, Hyung Yong ^{[1
,2
]}

Yoon, Ji Won ^{[1
,2
]}

Cho, Won Ik ^{[1
,2
]}

Kim, Nam Soo ^{[1
,2
]}

机构：

[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea

[2] Seoul Natl Univ, Inst New Media & Commun, Seoul 08826, South Korea

来源：

IEEE SIGNAL PROCESSING LETTERS | 2022年 / 29卷

关键词：

Decoding; Speech coding; Speech codecs; Bit rate; Encoding; Convolution; Knowledge engineering; generative adversarial network; generative model; attention mechanism; NETWORKS;

D O I：

10.1109/LSP.2021.3132557

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Recently, a conventional neural decoder for speech codec has shown promising performance. However, it typically requires some prior knowledge of decoding such as bit allocation or dequantization scheme, which is not a universal solution for many different kinds of speech codecs. In order to address this limitation, we propose a neurally optimized decoder based on a generative model which can directly reconstruct the speech from the bitstream without a prior knowledge. The proposed decoder mainly consists of two components: 1) a dequantization model to group and dequantize related bits from the bitstream and 2) a generative model to restore the speech conditioned on the output of the dequantization model. Through experiments with mixed excitation linear prediction (MELP), Advanced multi-band excitation (AMBE), and SPEEX at around 2.4 kb/s, it is showed that the proposed model showed better performance in most of the objective and subjective evaluation compared to the conventional speech codecs.

引用

页码：244 / 248

页数：5

共 50 条

[1] A low-power DSP core architecture for low bitrate speech codec
Okuhata, H
Miki, MH
Onoye, T
Shirakawa, I
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 1998, E81A (08) : 1616 - 1621
[2] A low-power DSP core architecture for low bitrate speech CODEC
Okuhata, H
Miki, MH
Onoye, T
Shirakawa, I
PROCEEDINGS OF THE 1998 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-6, 1998, : 3121 - 3124
[3] A scalable wideband speech codec using the wavelet packet transform based on the internet low bitrate codec
Seto, Koji
Ogunfunmi, Tokunbo
COMPUTER SPEECH AND LANGUAGE, 2019, 54 : 61 - 70
[4] Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Jiang, Xue
Peng, Xiulian
Zhang, Yuan
Lu, Yan
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (08) : 1477 - 1489
[5] Design of a Bitrate Scalable Speech Codec Based on G.723.1
Lee, Joonseok
Kang, Sangwon
Lee, Kangeun
Park, Dongwon
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2005, 24 (06): : 358 - 364
[6] ARCHITECTURE FOR VARIABLE BITRATE NEURAL SPEECH CODEC WITH CONFIGURABLE COMPUTATION COMPLEXITY
Jayashankar, Tejas
Koehler, Thilo
Kalgaonkar, Kaustubh
Xiu, Zhiping
Wu, Jilong
Lin, Ju
Agrawal, Prabhav
He, Qing
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 861 - 865
[7] A low power CELP decoder VLSI architecture with reduced memory requirement for low bit rate speech codec
Suen, AN
Wang, JF
Lin, JL
INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS, 1997 DIGEST OF TECHNICAL PAPERS, 1997, : 214 - 215
[8] SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Liu, Haohe
Xu, Xuenan
Yuan, Yi
Wu, Mengyue
Wang, Wenwu
Plumbley, Mark D.
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (08) : 1448 - 1461
[9] Low power design for Speech Codec
Okamura, T
Kinoshita, Y
Yoshida, H
Yamane, D
ELEVENTH ANNUAL IEEE INTERNATIONAL ASIC CONFERENCE - PROCEEDINGS, 1998, : 135 - 138
[10] SPEECH ENHANCEMENT FOR LOW BIT RATE SPEECH CODEC
Lin, Ju
Kalgaonkar, Kaustubh
He, Qing
Lei, Xin
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7777 - 7781

← 1 2 3 4 5 →