Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding

被引:18
|
作者
Cernak, Milos [1 ]
Lazaridis, Alexandros [1 ]
Asaei, Afsaneh [1 ]
Garner, Philip N. [1 ]
机构
[1] Idiap Res Inst, Ctr Parc, CH-1920 Martigny, Switzerland
基金
瑞士国家科学基金会;
关键词
Very low bit rate speech coding; deep neural networks; spiking neural networks; continuous F0 coding; RECOGNITION; ATTRIBUTE;
D O I
10.1109/TASLP.2016.2604566
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Most current very low bit rate (VLBR) speech coding systems use hidden Markov model (HMM) based speech recognition and synthesis techniques. This allows transmission of information (such as phonemes) segment by segment; this decreases the bit rate. However, an encoder based on a phoneme speech recognition may create bursts of segmental errors; these would be further propagated to any suprasegmental (such as syllable) information coding. Together with the errors of voicing detection in pitch parametrization, HMM-based speech coding leads to speech discontinuities and unnatural speech sound artifacts. In this paper, we propose a novel VLBR speech coding framework based on neural networks (NNs) for end-to-end speech analysis and synthesis without HMMs. The speech coding framework relies on a phonological (subphonetic) representation of speech. It is designed as a composition of deep and spiking NNs: a bank of phonological analyzers at the transmitter, and a phonological synthesizer at the receiver. These are both realized as deep NNs, along with a spiking NNas an incremental and robust encoder of syllable boundaries for coding of continuous fundamental frequency (F0). A combination of phonological features defines much more sound patterns than phonetic features defined by HMM-based speech coders; this finer analysis/synthesis code contributes to smoother encoded speech. Listeners significantly prefer the NN-based approach due to fewer discontinuities and speech artifacts of the encoded speech. A single forward pass is required during the speech encoding and decoding. The proposed VLBR speech coding operates at a bit rate of approximately 360 bits/s.
引用
收藏
页码:2301 / 2312
页数:12
相关论文
共 50 条
  • [41] Very low bit-rate digital video coding
    Scargall, Lee
    Dlay, Satnam
    Advances in Intelligent Systems and Computer Science, 1999, : 273 - 279
  • [42] Speech coding at very low bit-rates for mobile communication
    Gandhi, AG
    Dhekane, SS
    APCC 2003: 9TH ASIA-PACIFIC CONFERENCE ON COMMUNICATION, VOLS 1-3, PROCEEDINGS, 2003, : 358 - 362
  • [43] Pitch quantization in low bit-rate speech coding
    Eriksson, T
    Kang, HG
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 489 - 492
  • [44] TTS based very low bit rate speech coder
    Lee, Ki-Seung
    Cox, Richard V.
    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 1999, 1 : 181 - 184
  • [45] TTS based very low bit rate speech coder
    Lee, KS
    Cox, RV
    ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 181 - 184
  • [46] SIGNAL MODELS FOR LOW BIT-RATE CODING OF SPEECH
    FLANAGAN, JL
    ISHIZAKA, K
    SHIPLEY, KL
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1980, 68 (03): : 780 - 791
  • [47] VLSI design of a very low bit rate speech decoder
    Wang, JC
    Wang, JF
    Chao, YF
    Shi, MC
    PROCEEDINGS OF THE THIRD IASTED INTERNATIONAL CONFERENCE ON CIRCUITS, SIGNALS, AND SYSTEMS, 2005, : 239 - 243
  • [48] Enhancement artificial neural networks for low-bit rate speech compression system
    Srinonchat, J.
    2006 INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES,VOLS 1-3, 2006, : 986 - 989
  • [49] FAST LOW BIT RATE LATTICE ENTROPY CODING FOR SPEECH AND AUDIO CODING
    Vasilache, Adriana
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 719 - 723
  • [50] Adaptive rate control scheme for very low bit rate video coding
    Oh, HS
    Lee, HK
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 1996, 42 (04) : 974 - 980