A robust visual feature extraction based BTSM-LDA for audio-visual speech recognition

被引:0
|
作者
Lv, Guoyun [1 ]
Zhao, Rongchun [1 ]
Jiang, Dongmei [1 ]
Li, Yan [1 ]
Sahli, H. [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
[2] Vrije Univ Brussel, Dept ETRO, B-1050 Brussels, Belgium
关键词
dynamic Bayesian networks; Bayesian tangent shape model; audio-visual; speech recognition;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
the asynchrony for speech and lip movement is key problem of audio-visual speech recognition (AVSR) system. A Multi-Stream Asynchrony Dynamic Bayesian Network (MS-ADBN) model is proposed for audio-visual speech recognition. Comparing with Multi-Stream HMM (MSE[MM), MS-ADBN model describes the asynchrony of audio stream and visual stream to the word level. Simultaneously, based on profile of lip implemented by using Bayesian Tangent Shape Model (BTSM), Linear Discrimination Analysis (LDA) is used for visual feature extraction which describes the dynamic feature of lip and removes the redundancy of lip geometrical feature. The experiments results on continuous digit audio-visual database show that Up dynamic feature based on BTSM and LDA is more stable and robust than direct lip geometrical feature. In the noisy environments with signal to noise ratios ranging from 0dB to 30dB, comparing with MSHMM, MS-ADBN model with MFCC and LDA visual features has an average improvement of 4.92% in speech recognition rate.
引用
收藏
页码:1044 / +
页数:2
相关论文
共 50 条
  • [1] Automatic Visual Feature Extraction for Mandarin Audio-Visual Speech Recognition
    Pao, Tsang-Long
    Liao, Wen-Yuan
    Wu, Tsan-Nung
    Lin, Ching-Yi
    2009 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN AND CYBERNETICS (SMC 2009), VOLS 1-9, 2009, : 2936 - 2940
  • [2] A HYBRID VISUAL FEATURE EXTRACTION METHOD FOR AUDIO-VISUAL SPEECH RECOGNITION
    Wu, Guanyong
    Zhu, Jie
    Xu, Haihua
    2009 16TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOLS 1-6, 2009, : 1829 - 1832
  • [3] Audio-visual speech recognition based on joint training with audio-visual speech enhancement for robust speech recognition
    Hwang, Jung-Wook
    Park, Jeongkyun
    Park, Rae-Hong
    Park, Hyung-Min
    APPLIED ACOUSTICS, 2023, 211
  • [4] Information Theoretic Feature Extraction for Audio-Visual Speech Recognition
    Gurban, Mihai
    Thiran, Jean-Philippe
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2009, 57 (12) : 4765 - 4776
  • [5] An Extension 2DPCA based Visual Feature Extraction Method for Audio-Visual Speech Recognition
    Wu, Guanyong
    Zhu, Jie
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 241 - 244
  • [6] A Robust Audio-visual Speech Recognition Using Audio-visual Voice Activity Detection
    Tamura, Satoshi
    Ishikawa, Masato
    Hashiba, Takashi
    Takeuchi, Shin'ichi
    Hayamizu, Satoru
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2702 - +
  • [7] A cascade gray-stereo visual feature extraction method for visual and audio-visual speech recognition
    Sui, Chao
    Togneri, Roberto
    Bennamoun, Mohammed
    SPEECH COMMUNICATION, 2017, 90 : 26 - 38
  • [8] Robust audio-visual speech recognition based on late integration
    Lee, Jong-Seok
    Park, Cheol Hoon
    IEEE TRANSACTIONS ON MULTIMEDIA, 2008, 10 (05) : 767 - 779
  • [9] Robust Audio-Visual Speech Recognition Based on Hybrid Fusion
    Liu, Hong
    Li, Wenhao
    Yang, Bing
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7580 - 7586
  • [10] Appearance and shape-based hybrid visual feature extraction: toward audio-visual automatic speech recognition
    Debnath, Saswati
    Roy, Pinki
    SIGNAL IMAGE AND VIDEO PROCESSING, 2021, 15 (01) : 25 - 32