Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding

被引：18

作者：

Cernak, Milos ^{[1
]}

Lazaridis, Alexandros ^{[1
]}

Asaei, Afsaneh ^{[1
]}

Garner, Philip N. ^{[1
]}

机构：

[1] Idiap Res Inst, Ctr Parc, CH-1920 Martigny, Switzerland

来源：

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2016年 / 24卷 / 12期

基金：

瑞士国家科学基金会;

关键词：

Very low bit rate speech coding; deep neural networks; spiking neural networks; continuous F0 coding; RECOGNITION; ATTRIBUTE;

D O I：

10.1109/TASLP.2016.2604566

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Most current very low bit rate (VLBR) speech coding systems use hidden Markov model (HMM) based speech recognition and synthesis techniques. This allows transmission of information (such as phonemes) segment by segment; this decreases the bit rate. However, an encoder based on a phoneme speech recognition may create bursts of segmental errors; these would be further propagated to any suprasegmental (such as syllable) information coding. Together with the errors of voicing detection in pitch parametrization, HMM-based speech coding leads to speech discontinuities and unnatural speech sound artifacts. In this paper, we propose a novel VLBR speech coding framework based on neural networks (NNs) for end-to-end speech analysis and synthesis without HMMs. The speech coding framework relies on a phonological (subphonetic) representation of speech. It is designed as a composition of deep and spiking NNs: a bank of phonological analyzers at the transmitter, and a phonological synthesizer at the receiver. These are both realized as deep NNs, along with a spiking NNas an incremental and robust encoder of syllable boundaries for coding of continuous fundamental frequency (F0). A combination of phonological features defines much more sound patterns than phonetic features defined by HMM-based speech coders; this finer analysis/synthesis code contributes to smoother encoded speech. Listeners significantly prefer the NN-based approach due to fewer discontinuities and speech artifacts of the encoded speech. A single forward pass is required during the speech encoding and decoding. The proposed VLBR speech coding operates at a bit rate of approximately 360 bits/s.

引用

页码：2301 / 2312

页数：12

共 50 条

[21] THE APPLICATION OF ARTIFICIAL NEURAL NETWORK TECHNIQUES TO LOW BIT-RATE SPEECH CODING
KAOURI, HA
MCCANNY, JV
FIRST IEE INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1989, : 100 - 104
[22] COMPREHENSIVE IMPROVEMENT IN LOW BIT RATE SPEECH CODING
FAN, CX
MA, HF
DALLAS GLOBECOM 89, VOLS 1-3: COMMUNICATIONS TECHNOLOGY FOR THE 1990S AND BEYOND, 1989, : 1916 - 1920
[23] Low bit rate wideband WI speech coding
Ritz, CH
Burnett, IS
Lukasiak, J
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING I, 2003, : 804 - 807
[24] Linear inter-frame dependencies for very low bit-rate speech coding
López-Soler, JM
Sánchez, V
de la Torre, A
Rubio-Ayuso, AJ
SPEECH COMMUNICATION, 2001, 34 (04) : 333 - 349
[25] LOW BIT RATE SPEECH CODING FOR PRACTICAL APPLICATIONS
SOUTHCOTT, CB
BOYD, I
COLEMAN, AE
HAMMETT, PG
BRITISH TELECOM TECHNOLOGY JOURNAL, 1988, 6 (02): : 22 - 40
[26] Low bit rate wideband WI speech coding
Ritz, CH
Burnett, IS
Lukasiak, J
2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL I, PROCEEDINGS, 2003, : 377 - 380
[27] Reducing the Spike Rate in Deep Spiking Neural Networks
Fontanini, Riccardo
Esseni, David
Loghi, Mirko
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON NEUROMORPHIC SYSTEMS 2022, ICONS 2022, 2022,
[28] Exploration of rank order coding with spiking neural networks for speech recognition
Loiselle, S
Rouat, J
Pressnitzer, D
Thorpe, S
PROCEEDINGS OF THE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), VOLS 1-5, 2005, : 2076 - 2080
[29] CELP BASED MIXED-SOURCE MODEL FOR VERY LOW BIT-RATE SPEECH CODING
KWON, CH
UN, CK
ELECTRONICS LETTERS, 1993, 29 (02) : 156 - 157
[30] Very low bit rate speech coding using a diphone-based recognition and synthesis approach
Felici, M
Borgatti, M
Guerrieri, R
ELECTRONICS LETTERS, 1998, 34 (09) : 859 - 860

← 1 2 3 4 5 →