IMPROVEMENTS TO FILTERBANK AND DELTA LEARNING WITHIN A DEEP NEURAL NETWORK FRAMEWORK

被引:0
|
作者
Sainath, Tara N. [1 ]
Kingsbury, Brian [1 ]
Mohamed, Abdel-rahman
Saon, George [1 ]
Ramabhadran, Bhuvana [1 ]
机构
[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA
关键词
SPEECH RECOGNITION;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art system. First, we incorporate delta learning into the filter learning framework. Second, we incorporate various speaker adaptation techniques, including VTLN warping and speaker identity features. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter and delta learning, compared to having a fixed set of filters and deltas. Furthermore, after speaker adaptation, we find that filter and delta learning allows for a 3% relative improvement in WER compared to a state-of-the-art CNN.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] LEARNING FILTER BANKS WITHIN A DEEP NEURAL NETWORK FRAMEWORK
    Sainath, Tara N.
    Kingsbury, Brian
    Mohamed, Abdel-rahman
    Ramabhadran, Bhuvana
    2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 297 - 302
  • [2] A DEEP NEURAL NETWORK INTEGRATED WITH FILTERBANK LEARNING FOR SPEECH RECOGNITION
    Seki, Hiroshi
    Yamamoto, Kazumasa
    Nakagawa, Seiichi
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5480 - 5484
  • [3] Discriminative Learning of Filterbank Layer within Deep Neural Network Based Speech Recognition for Speaker Adaptation
    Seki, Hiroshi
    Yamamoto, Kazumasa
    Akiba, Tomoyosi
    Nakagawa, Seiichi
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (02) : 364 - 374
  • [4] Filterbank Learning for Deep Neural Network Based Polyphonic Sound Event Detection
    Cakir, Emre
    Ozan, Ezgi Can
    Virtanen, Tuomas
    2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3399 - 3406
  • [5] A Deep Learning Framework for Coreference Resolution Based on Convolutional Neural Network
    Wu, Jheng-Long
    Ma, Wei-Yun
    2017 11TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2017, : 61 - 64
  • [6] NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models
    Caroli, Frederico Tommasi
    Pereira da Silva, Joao Carlos
    Freitas, Andre
    Handschuh, Siegfried
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2081 - 2085
  • [7] A Deep Learning Based Collaborative Neural Network Framework for Recommender System
    Almaghrabi, Maram
    Chetty, Girija
    2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND DATA ENGINEERING (ICMLDE 2018), 2018, : 121 - 127
  • [8] The improvements of BP neural network learning algorithm
    Wei, WJ
    Li, ZJ
    Wei, LS
    Zhen, H
    2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 1647 - 1649
  • [9] Spatiotemporal flood depth and velocity dynamics using a convolutional neural network within a sequential Deep-Learning framework
    Fathi, Mohamed M.
    Liu, Zihan
    Fernandes, Anjali M.
    Hren, Michael T.
    Terry, Dennis O.
    Nataraj, C.
    Smith, Virginia
    ENVIRONMENTAL MODELLING & SOFTWARE, 2025, 185
  • [10] Distributed Deep Learning Framework based on Shared Memory for Fast Deep Neural Network Training
    Lim, Eun-Ji
    Ahn, Shin-Young
    Park, Yoo-Mi
    Choi, Wan
    2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1239 - 1242