Neurally Optimized Decoder for Low Bitrate Speech Codec

被引:1
|
作者
Kim, Hyung Yong [1 ,2 ]
Yoon, Ji Won [1 ,2 ]
Cho, Won Ik [1 ,2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Inst New Media & Commun, Seoul 08826, South Korea
关键词
Decoding; Speech coding; Speech codecs; Bit rate; Encoding; Convolution; Knowledge engineering; generative adversarial network; generative model; attention mechanism; NETWORKS;
D O I
10.1109/LSP.2021.3132557
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, a conventional neural decoder for speech codec has shown promising performance. However, it typically requires some prior knowledge of decoding such as bit allocation or dequantization scheme, which is not a universal solution for many different kinds of speech codecs. In order to address this limitation, we propose a neurally optimized decoder based on a generative model which can directly reconstruct the speech from the bitstream without a prior knowledge. The proposed decoder mainly consists of two components: 1) a dequantization model to group and dequantize related bits from the bitstream and 2) a generative model to restore the speech conditioned on the output of the dequantization model. Through experiments with mixed excitation linear prediction (MELP), Advanced multi-band excitation (AMBE), and SPEEX at around 2.4 kb/s, it is showed that the proposed model showed better performance in most of the objective and subjective evaluation compared to the conventional speech codecs.
引用
收藏
页码:244 / 248
页数:5
相关论文
共 50 条
  • [21] Ultra-Low-Bitrate Speech Coding with Pretrained Transformers
    Siahkoohi, Ali
    Chinen, Michael
    Denton, Tom
    Kleijn, W. Bastiaan
    Skoglund, Jan
    INTERSPEECH 2022, 2022, : 4421 - 4425
  • [22] A low complexity speech codec and its error protection
    Ikedo, J
    Kataoka, A
    IEICE TRANSACTIONS ON COMMUNICATIONS, 1997, E80B (11) : 1688 - 1695
  • [23] A New Speech Codec Based on ANN with Low Delay
    YANG Zhen (Nanjing University of Posts & Telecommunications
    TheJournalofChinaUniversitiesofPostsandTelecommunications, 2002, (04) : 1 - 7
  • [24] OPTIMAL MULTI-CODEC ADAPTIVE BITRATE STREAMING
    Reznik, Yuriy A.
    Li, Xiangbo
    Lillevold, Karl O.
    Jagannath, Abhijith
    Greer, Justin
    2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA & EXPO WORKSHOPS (ICMEW), 2019, : 348 - 353
  • [25] Optimized Viterbi decoder for low data rate systems
    Bianchi, D.
    Cardarilli, G. C.
    Del Re, A.
    Re, M.
    2006 FORTIETH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, VOLS 1-5, 2006, : 1166 - +
  • [26] Fast Randomization for Distributed Low-Bitrate Coding of Speech and Audio
    Backstrom, Tom
    Fischer, Johannes
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (01) : 19 - 30
  • [27] A CELP variable rate speech codec with low average rate
    Zhang, L
    Wang, T
    Cuperman, V
    1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I - V: VOL I: PLENARY, EXPERT SUMMARIES, SPECIAL, AUDIO, UNDERWATER ACOUSTICS, VLSI; VOL II: SPEECH PROCESSING; VOL III: SPEECH PROCESSING, DIGITAL SIGNAL PROCESSING; VOL IV: MULTIDIMENSIONAL SIGNAL PROCESSING, NEURAL NETWORKS - VOL V: STATISTICAL SIGNAL AND ARRAY PROCESSING, APPLICATIONS, 1997, : 735 - 738
  • [28] An 8 kb/s low complexity ACELP speech codec
    Cheng, DY
    ICSP '96 - 1996 3RD INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, PROCEEDINGS, VOLS I AND II, 1996, : 671 - 674
  • [29] Steganography Integration Into a Low-Bit Rate Speech Codec
    Huang, Yongfeng
    Liu, Chenghao
    Tang, Shanyu
    Bai, Sen
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2012, 7 (06) : 1865 - 1875
  • [30] A high-fidelity speech and audio codec with low delay and low complexity
    Chen, JH
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 1161 - 1164