JOINT ENCODING OF THE WAVEFORM AND SPEECH RECOGNITION FEATURES USING A TRANSFORM CODEC

被引:0
|
作者
Fan, Xing [1 ]
Seltzer, Michael L. [1 ]
Droppo, Jasha [1 ]
Malvar, Henrique S. [1 ]
Acero, Alex [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
关键词
transform coding; speech coding; distributed speech recognition; Siren codec;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We propose a new transform speech codec that jointly encodes a wideband waveform and its corresponding wideband and narrowband speech recognition features. For distributed speech recognition, wideband features are compressed and transmitted as side information. The waveform is then encoded in a manner that exploits the information already captured by the speech features. Narrowband speech acoustic features can be synthesized at the server by applying a transformation to the decoded wideband features. An evaluation conducted on an in-car speech recognition task show that at 16 kbps our new system typically shows essentially no impact in word error rate compared to uncompressed audio, whereas the standard transform codec produces up to a 20% increase in word error rate. In addition, good quality speech is obtained for playback and transcription, with PESQ scores ranging from 3.2 to 3.4.
引用
收藏
页码:5148 / 5151
页数:4
相关论文
共 50 条
  • [31] Joint frequency domain and reconstructed phase space features for speech recognition
    Lindgren, AC
    Johnson, MT
    Povinelli, RJ
    2004 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL I, PROCEEDINGS: SPEECH PROCESSING, 2004, : 533 - 536
  • [32] Robust Raw Waveform Speech Recognition Using Relevance Weighted Representations
    Agrawal, Purvi
    Ganapathy, Sriram
    INTERSPEECH 2020, 2020, : 1649 - 1653
  • [33] Noise-robust cellular phone speech recognition using CODEC-adapted speech and noise models
    Kato, T
    Naito, M
    Shimizu, T
    2002 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS I-IV, PROCEEDINGS, 2002, : 285 - 288
  • [34] Phase Autocorrelation Bark Wavelet Transform (PACWT) Features for Robust Speech Recognition
    Majeed, Sayf A.
    Husain, Hafizah
    Samad, Salina A.
    ARCHIVES OF ACOUSTICS, 2015, 40 (01) : 25 - 31
  • [35] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
    Kadin, Sudarsana Reddy
    Gangamohan, P.
    Gangashetty, Suryakanth, V
    Alku, Paavo
    Yegnanarayana, B.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2020, 39 (09) : 4459 - 4481
  • [36] Excitation Features of Speech for Emotion Recognition Using Neutral Speech as Reference
    Sudarsana Reddy Kadiri
    P. Gangamohan
    Suryakanth V. Gangashetty
    Paavo Alku
    B. Yegnanarayana
    Circuits, Systems, and Signal Processing, 2020, 39 : 4459 - 4481
  • [37] Joint Transform Correlator Based on CIELAB Model with Encoding Technique for Color Pattern Recognition
    Lin, Tiengsheng
    Chen, Chulung
    Liu, Chengyu
    Chen, Yuming
    5TH INTERNATIONAL SYMPOSIUM ON ADVANCED OPTICAL MANUFACTURING AND TESTING TECHNOLOGIES: OPTOELECTRONIC MATERIALS AND DEVICES FOR DETECTOR, IMAGER, DISPLAY, AND ENERGY CONVERSION TECHNOLOGY, 2010, 7658
  • [38] Face recognition using transform features and neural networks
    Ranganath, S
    Arun, K
    PATTERN RECOGNITION, 1997, 30 (10) : 1615 - 1622
  • [39] Face Recognition using Transform Domain Texture Features
    Rangaswamy, Y.
    Ramya, S. K.
    Raja, K. B.
    Venugopal, K. R.
    Patnaik, L. M.
    SIXTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2013), 2013, 9067
  • [40] IRIS RECOGNITION USING SCATTERING TRANSFORM AND TEXTURAL FEATURES
    Minaee, Shervin
    Abdolrashidi, AmirAli
    Wang, Yao
    2015 IEEE SIGNAL PROCESSING AND SIGNAL PROCESSING EDUCATION WORKSHOP (SP/SPE), 2015, : 37 - 42