Spatial position constraint for unsupervised learning of speech representations

被引：2

作者：

Humayun, Mohammad Ali ^{[1
]}

Yassin, Hayati ^{[1
]}

Abas, Pg Emeroylariffion ^{[1
]}

机构：

[1] Univ Brunei Darussalam, Fac Integrated Technol, Jalan Tungku Link, Gadong, Brunei

来源：

PEERJ COMPUTER SCIENCE | 2021年 / 7卷

关键词：

Low resource speech; Representation learning; Multitasking; Geometric constraint;

D O I：

10.7717/peerj-cs.650

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The success of supervised learning techniques for automatic speech processing does not always extend to problems with limited annotated speech. Unsupervised representation learning aims at utilizing unlabelled data to learn a transformation that makes speech easily distinguishable for classification tasks, whereby deep auto-encoder variants have been most successful in finding such representations. This paper proposes a novel mechanism to incorporate geometric position of speech samples within the global structure of an unlabelled feature set. Regression to the geometric position is also added as an additional constraint for the representation learning auto-encoder. The representation learnt by the proposed model has been evaluated over a supervised classification task for limited vocabulary keyword spotting, with the proposed representation outperforming the commonly used cepstral features by about 9% in terms of classification accuracy, despite using a limited amount of labels during supervision. Furthermore, a small keyword dataset has been collected for Kadazan, an indigenous, low-resourced Southeast Asian language. Analysis for the Kadazan dataset also confirms the superiority of the proposed representation for limited annotation. The results are significant as they confirm that the proposed method can learn unsupervised speech representations effectively for classification tasks with scarce labelled data.

引用

页数：24

共 50 条

[41] CONTRASTIVE UNSUPERVISED LEARNING FOR SPEECH EMOTION RECOGNITION
Li, Mao
Yang, Bo
Levy, Joshua
Stolcke, Andreas
Rozgic, Viktor
Matsoukas, Spyros
Papayiannis, Constantinos
Bone, Daniel
Wang, Chao
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6329 - 6333
[42] Speech emotion recognition with unsupervised feature learning
Huang, Zheng-wei
Xue, Wen-tao
Mao, Qi-rong
FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2015, 16 (05) : 358 - 366
[43] TOWARDS UNSUPERVISED LEARNING OF SPEECH FEATURES IN THE WILD
Riviere, Morgane
Dupoux, Emmanuel
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 156 - 163
[44] Speech emotion recognition with unsupervised feature learning
Zheng-wei HUANG
Wen-tao XUE
Qi-rong MAO
FrontiersofInformationTechnology&ElectronicEngineering, 2015, 16 (05) : 358 - 366
[45] Position Estimation of Camera Based on Unsupervised Learning
Wu, YanTong
Liu, Yang
Li, XueMing
PRAI 2018: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2018, : 30 - 35
[46] Unsupervised Adaptation with Interpretable Disentangled Representations for Distant Conversational Speech Recognition
Hsu, Wei-Ning
Tang, Hao
Glass, James
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1576 - 1580
[47] SNoRe: Scalable Unsupervised Learning of Symbolic Node Representations
Meznar, Sebastian
Lavrac, Nada
Skrlj, Blaz
IEEE ACCESS, 2020, 8 : 212568 - 212588
[48] Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles
Noroozi, Mehdi
Favaro, Paolo
COMPUTER VISION - ECCV 2016, PT VI, 2016, 9910 : 69 - 84
[49] Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations
Van Gansbeke, Wouter
Vandenhende, Simon
Georgoulis, Stamatios
Van Gool, Luc
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[50] Assessing similarity of emergent representations based on unsupervised learning
Raitio, J
Vigário, R
Särelä, J
Honkela, T
2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 597 - 602

← 1 2 3 4 5 →