JOINT ENCODING OF THE WAVEFORM AND SPEECH RECOGNITION FEATURES USING A TRANSFORM CODEC

被引:0
|
作者
Fan, Xing [1 ]
Seltzer, Michael L. [1 ]
Droppo, Jasha [1 ]
Malvar, Henrique S. [1 ]
Acero, Alex [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
关键词
transform coding; speech coding; distributed speech recognition; Siren codec;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a new transform speech codec that jointly encodes a wideband waveform and its corresponding wideband and narrowband speech recognition features. For distributed speech recognition, wideband features are compressed and transmitted as side information. The waveform is then encoded in a manner that exploits the information already captured by the speech features. Narrowband speech acoustic features can be synthesized at the server by applying a transformation to the decoded wideband features. An evaluation conducted on an in-car speech recognition task show that at 16 kbps our new system typically shows essentially no impact in word error rate compared to uncompressed audio, whereas the standard transform codec produces up to a 20% increase in word error rate. In addition, good quality speech is obtained for playback and transcription, with PESQ scores ranging from 3.2 to 3.4.
引用
收藏
页码:5148 / 5151
页数:4
相关论文
共 50 条
  • [1] Waveform-Domain Speech Enhancement Using Spectrogram Encoding for Robust Speech Recognition
    Shi, Hao
    Mimura, Masato
    Kawahara, Tatsuya
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 3049 - 3060
  • [2] Color pattern recognition using image encoding joint transform correlator
    Lee, Chungcheng
    Chen, Chulung
    MICROWAVE AND OPTICAL TECHNOLOGY LETTERS, 2007, 49 (07) : 1665 - 1669
  • [3] Visual speech recognition using wavelet transform and moment based features
    Yau, Wai C.
    Kumar, Dinesh K.
    Arjunan, Sridhar P.
    Kumar, Sanjay
    ICINCO 2006: PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS: ROBOTICS AND AUTOMATION, 2006, : 340 - 345
  • [4] Speech Recognition using Hilbert-Huang Transform Based Features
    Hanna, Samer S.
    Korany, Noha
    Abd-el-Malek, Mina B.
    2017 40TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2017, : 338 - 341
  • [5] Fractional Fourier transform features for speech recognition
    Sarikaya, R
    Gao, YQ
    Saon, G
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 529 - 532
  • [6] Emotion recognition from speech using wavelet packet transform and prosodic features
    Gupta, Manish
    Bharti, Shambhu Shankar
    Agarwal, Suneeta
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 35 (02) : 1541 - 1553
  • [7] A 0.75 Kbps speech codec using recognition and synthesis schemes
    Chen, HC
    Chen, CY
    Tsou, KM
    Chen, OTC
    1997 IEEE WORKSHOP ON SPEECH CODING FOR TELECOMMUNICATIONS, PROCEEDINGS: BACK TO BASICS: ATTACKING FUNDAMENTAL PROBLEMS IN SPEECH CODING, 1997, : 27 - 28
  • [8] Speaker Recognition on Lossy Compressed Speech using the Speex Codec
    Stauffer, A. R.
    Lawson, A. D.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2295 - 2298
  • [9] Speaker identification employing waveform based speech CODEC
    Mikhael, WB
    Premakanthan, P
    2002 45TH MIDWEST SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOL III, CONFERENCE PROCEEDINGS, 2002, : 340 - 343
  • [10] Region Dependent Transform on MLP Features for Speech Recognition
    Ng, Tim
    Zhang, Bing
    Matsoukas, Spyros
    Long Nguyen
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 228 - 231