Generalized Hough Transform for Speech Pattern Classification

被引:2
|
作者
Dennis, Jonathan [1 ]
Huy Dat Tran [1 ]
Li, Haizhou [1 ]
机构
[1] A STAR Inst Infocomm Res, Singapore 138632, Singapore
关键词
Codebook activation map; generalized Hough transform; speech pattern classification; TIMIT; OBJECT DETECTION; NEURAL-NETWORKS; IMAGE FEATURE; FEATURES;
D O I
10.1109/TASLP.2015.2459599
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While typical hybrid neural network architectures for automatic speech recognition (ASR) use a context window of frame-based features, this may not be the best approach to capture the wider temporal context, which contains phonetic and linguistic information that is equally important. In this paper, we introduce a system that integrates both the spectral and geometrical shape information from the acoustic spectrum, inspired by research in the field of machine vision. In particular, we focus on the Generalized Hough Transform (GHT), which is a sophisticated technique that can model the geometrical distribution of speech information over the wider temporal context. To integrate the GHT as part of a hybrid-ASR system, we propose to use a neural network, with features derived from the probabilistic Hough voting step of the GHT, to implement an improved version of the GHT where the output of the network represents the conventional target class posteriors. A major advantage of our approach is that each step of the GHT is highly interpretable, particularly compared to deep neural network (DNN) systems which are commonly treated as powerful black-box classifiers that give little insight into how the output is achieved. Experiments are carried out on two speech pattern classification tasks. The first is the TIMIT phoneme classification, which demonstrates the performance of the approach on a standard ASR task. The second is a spoken word recognition challenge, which highlights the flexibility of the approach to capture phonetic information within a longer temporal context.
引用
收藏
页码:1963 / 1972
页数:10
相关论文
共 50 条
  • [41] An Approach of Animal Detection Based on Generalized Hough Transform
    Chu, Weimeng
    Liu, Fang
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER, NETWORKS AND COMMUNICATION ENGINEERING (ICCNCE 2013), 2013, 30 : 117 - 120
  • [42] SYSTOLIC MERGING AND RANKING OF VOTES FOR THE GENERALIZED HOUGH TRANSFORM
    ALBANESI, M
    FERRETTI, M
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 1995, 9 (02) : 315 - 341
  • [43] Augmenting the Generalized Hough Transform to Enable the Mining of Petroglyphs
    Zhu, Qiang
    Wang, Xiaoyue
    Keogh, Eamonn
    Lee, Sang-Hee
    KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2009, : 1057 - 1065
  • [44] Real-time computation of the generalized hough transform
    Maruyama, T
    FIELD-PROGRAMMABLE LOGIC AND APPLICATIONS, PROCEEDINGS, 2004, 3203 : 980 - 985
  • [45] Analysis of the Discriminative Generalized Hough Transform for Pedestrian Detection
    Gabriel, Eric
    Schramm, Hauke
    Meyer, Carsten
    IMAGE ANALYSIS AND PROCESSING (ICIAP 2017), PT II, 2017, 10485 : 104 - 115
  • [46] Improving the generalized Hough transform through imperfect grouping
    Olson, CF
    IMAGE AND VISION COMPUTING, 1998, 16 (9-10) : 627 - 634
  • [47] Connectionist model of the generalized Hough transform for optical implementation
    Javadpour, Z
    Keating, JG
    OPTICAL ENGINEERING, 2000, 39 (06) : 1717 - 1722
  • [48] An improved generalized Hough transform for the recognition of overlapping objects
    Tsai, DM
    IMAGE AND VISION COMPUTING, 1997, 15 (12) : 877 - 888
  • [49] Hough transform neural network for seismic pattern detection
    Huang, Kou-Yuan
    You, Jiun-De
    Chen, Kai-Ju
    Lai, Hung-Lin
    Don, An-Jin
    NEURAL INFORMATION PROCESSING, PT 2, PROCEEDINGS, 2006, 4233 : 60 - 69
  • [50] NEURAL NETWORKS AND HOUGH TRANSFORM FOR PATTERN-RECOGNITION
    COSTA, LDF
    SANDLER, MB
    FIRST IEE INTERNATIONAL CONFERENCE ON ARTIFICIAL NEURAL NETWORKS, 1989, : 81 - 85