IMPROVEMENTS TO FILTERBANK AND DELTA LEARNING WITHIN A DEEP NEURAL NETWORK FRAMEWORK

被引：0

作者：

Sainath, Tara N. ^{[1
]}

Kingsbury, Brian ^{[1
]}

Mohamed, Abdel-rahman

Saon, George ^{[1
]}

Ramabhadran, Bhuvana ^{[1
]}

机构：

[1] IBM TJ Watson Res Ctr, Yorktown Hts, NY 10598 USA

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

SPEECH RECOGNITION;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Many features used in speech recognition tasks are hand-crafted and are not always related to the objective at hand, that is minimizing word error rate. Recently, we showed that replacing a perceptually motivated mel-filter bank with a filter bank layer that is learned jointly with the rest of a deep neural network was promising. In this paper, we extend filter learning to a speaker-adapted, state-of-the-art system. First, we incorporate delta learning into the filter learning framework. Second, we incorporate various speaker adaptation techniques, including VTLN warping and speaker identity features. On a 50-hour English Broadcast News task, we show that we can achieve a 5% relative improvement in word error rate (WER) using the filter and delta learning, compared to having a fixed set of filters and deltas. Furthermore, after speaker adaptation, we find that filter and delta learning allows for a 3% relative improvement in WER compared to a state-of-the-art CNN.

引用

页数：5

共 50 条

[1] LEARNING FILTER BANKS WITHIN A DEEP NEURAL NETWORK FRAMEWORK
Sainath, Tara N.
Kingsbury, Brian
Mohamed, Abdel-rahman
Ramabhadran, Bhuvana
2013 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2013, : 297 - 302
[2] A DEEP NEURAL NETWORK INTEGRATED WITH FILTERBANK LEARNING FOR SPEECH RECOGNITION
Seki, Hiroshi
Yamamoto, Kazumasa
Nakagawa, Seiichi
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 5480 - 5484
[3] Discriminative Learning of Filterbank Layer within Deep Neural Network Based Speech Recognition for Speaker Adaptation
Seki, Hiroshi
Yamamoto, Kazumasa
Akiba, Tomoyosi
Nakagawa, Seiichi
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2019, E102D (02) : 364 - 374
[4] Filterbank Learning for Deep Neural Network Based Polyphonic Sound Event Detection
Cakir, Emre
Ozan, Ezgi Can
Virtanen, Tuomas
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 3399 - 3406
[5] A Deep Learning Framework for Coreference Resolution Based on Convolutional Neural Network
Wu, Jheng-Long
Ma, Wei-Yun
2017 11TH IEEE INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC), 2017, : 61 - 64
[6] NNBlocks: A Deep Learning Framework for Computational Linguistics Neural Network Models
Caroli, Frederico Tommasi
Pereira da Silva, Joao Carlos
Freitas, Andre
Handschuh, Siegfried
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 2081 - 2085
[7] A Deep Learning Based Collaborative Neural Network Framework for Recommender System
Almaghrabi, Maram
Chetty, Girija
2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND DATA ENGINEERING (ICMLDE 2018), 2018, : 121 - 127
[8] The improvements of BP neural network learning algorithm
Wei, WJ
Li, ZJ
Wei, LS
Zhen, H
2000 5TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I-III, 2000, : 1647 - 1649
[9] Spatiotemporal flood depth and velocity dynamics using a convolutional neural network within a sequential Deep-Learning framework
Fathi, Mohamed M.
Liu, Zihan
Fernandes, Anjali M.
Hren, Michael T.
Terry, Dennis O.
Nataraj, C.
Smith, Virginia
ENVIRONMENTAL MODELLING & SOFTWARE, 2025, 185
[10] Distributed Deep Learning Framework based on Shared Memory for Fast Deep Neural Network Training
Lim, Eun-Ji
Ahn, Shin-Young
Park, Yoo-Mi
Choi, Wan
2018 INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY CONVERGENCE (ICTC), 2018, : 1239 - 1242

← 1 2 3 4 5 →