Neurally Optimized Decoder for Low Bitrate Speech Codec

被引:1
|
作者
Kim, Hyung Yong [1 ,2 ]
Yoon, Ji Won [1 ,2 ]
Cho, Won Ik [1 ,2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 08826, South Korea
[2] Seoul Natl Univ, Inst New Media & Commun, Seoul 08826, South Korea
关键词
Decoding; Speech coding; Speech codecs; Bit rate; Encoding; Convolution; Knowledge engineering; generative adversarial network; generative model; attention mechanism; NETWORKS;
D O I
10.1109/LSP.2021.3132557
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Recently, a conventional neural decoder for speech codec has shown promising performance. However, it typically requires some prior knowledge of decoding such as bit allocation or dequantization scheme, which is not a universal solution for many different kinds of speech codecs. In order to address this limitation, we propose a neurally optimized decoder based on a generative model which can directly reconstruct the speech from the bitstream without a prior knowledge. The proposed decoder mainly consists of two components: 1) a dequantization model to group and dequantize related bits from the bitstream and 2) a generative model to restore the speech conditioned on the output of the dequantization model. Through experiments with mixed excitation linear prediction (MELP), Advanced multi-band excitation (AMBE), and SPEEX at around 2.4 kb/s, it is showed that the proposed model showed better performance in most of the objective and subjective evaluation compared to the conventional speech codecs.
引用
收藏
页码:244 / 248
页数:5
相关论文
共 50 条
  • [31] Channel-codec optimized soft input source decoding and its application for low-bit-rate speech transmission
    Xiao, H
    Yuan, JH
    Vucetic, B
    GLOBECOM 98: IEEE GLOBECOM 1998 - CONFERENCE RECORD, VOLS 1-6: THE BRIDGE TO GLOBAL INTEGRATION, 1998, : 815 - 820
  • [32] Efficient Stereo Bitrate Allocation for Fully Scalable Audio Codec
    Li, Te
    Rahardja, Susanto
    Koh, Soo Ngee
    2008 IEEE 10TH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING, VOLS 1 AND 2, 2008, : 925 - +
  • [33] DECODER SIDE TRUE MOTION ESTIMATION FOR VERY LOW BITRATE B-FRAME CODING
    Ates, Hasan F.
    Cizmeci, Burak
    2011 18TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2011, : 1673 - 1676
  • [34] LOW-COMPLEXITY BANDWIDTH EXTENSION IN MDCT DOMAIN FOR LOW-BITRATE SPEECH CODING
    Tsujino, Kosuke
    Kikuiri, Kei
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 4145 - 4148
  • [35] Universal steganography model for low bit-rate speech codec
    Tang, Shanyu
    Chen, Qing
    Zhang, Wei
    Huang, Yongfeng
    SECURITY AND COMMUNICATION NETWORKS, 2016, 9 (08) : 747 - 754
  • [36] Low power architectures for the MAP decoder with optimized memory sizes
    Atluri, I
    Arslan, T
    Proceedings of the 46th IEEE International Midwest Symposium on Circuits & Systems, Vols 1-3, 2003, : 1520 - 1523
  • [37] A SPEECH CODEC FOR THE SKYPHONE SERVICE
    BOYD, I
    SOUTHCOTT, CB
    BRITISH TELECOM TECHNOLOGY JOURNAL, 1988, 6 (02): : 50 - 59
  • [38] A speech codec for the Skyphone service
    Boyd, I.
    Southcott, C. B.
    BT TECHNOLOGY JOURNAL, 2007, 25 (3-4) : 151 - 160
  • [39] An OMNI-bitrate control algorithm for.DCT video codec
    Li, GP
    He, Y
    2004 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS 1-3, 2004, : 1203 - 1206
  • [40] A low-resolution optimized 3D-subband scalable codec
    Bourge, A
    Barrau, E
    IMAGE AND VIDEO COMMUNICATIONS AND PROCESSING 2003, PTS 1 AND 2, 2003, 5022 : 941 - 950