Fuzzy-Neural-Network Based Audio-Visual Fusion for Speech Recognition

被引:0
|
作者
Wu, Gin-Der [1 ]
Tsai, Hao-Shu [1 ]
机构
[1] Natl Chi Nan Univ, Dept Elect Engn, Nantou, Taiwan
关键词
speech recognition; classification; type-2 fuzzy sets; linear-discriminant-analysis; discriminability;
D O I
10.1109/icaiic.2019.8669019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition is an important classification problem in signal processing. Its performance is easily affected by noisy environment due to movements of desks, door slams, etc. To solve the problem, a fuzzy-neural-network based audio-visual fusion is proposed in this study. Since human speech perception is bimodal, the input features include both audio and image information. In the fuzzy-neural-network, type-2 fuzzy sets are used in the antecedent parts to deal with the noisy data. Furthermore, a linear-discriminant-analysis (LDA) is applied in to the consequent parts to increase the "discriminability". Compared with pure audio-based speech recognition, the fuzzy-neural-network based audio-visual fusion method is more robust in noisy environment.
引用
收藏
页码:210 / 214
页数:5
相关论文
共 50 条
  • [31] Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
    Wei, Jie
    Hu, Guanyu
    Yang, Xinyu
    Luu, Anh Tuan
    Dong, Yizhuo
    INTERSPEECH 2022, 2022, : 1988 - 1992
  • [32] Performance Improvement of Audio-Visual Speech Recognition with Optimal Reliability Fusion
    Tariquzzaman, Md
    Gyu, Song Min
    Young, Kim Jin
    You, Na Seung
    Rashid, M. A.
    2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL III, 2010, : 216 - 219
  • [33] Multimodal Attentive Fusion Network for audio-visual event recognition
    Brousmiche, Mathilde
    Rouat, Jean
    Dupont, Stephane
    INFORMATION FUSION, 2022, 85 : 52 - 59
  • [34] Decision Level Fusion for Audio-Visual Speech Recognition in Noisy Conditions
    Sad, Gonzalo D.
    Terissi, Lucas D.
    Gomez, Juan C.
    PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2016, 2017, 10125 : 360 - 367
  • [35] Audio-Visual Action Recognition Using Transformer Fusion Network
    Kim, Jun-Hwa
    Won, Chee Sun
    APPLIED SCIENCES-BASEL, 2024, 14 (03):
  • [36] CATNet: Cross-modal fusion for audio-visual speech recognition
    Wang, Xingmei
    Mi, Jiachen
    Li, Boquan
    Zhao, Yixu
    Meng, Jiaxiang
    PATTERN RECOGNITION LETTERS, 2024, 178 : 216 - 222
  • [37] An audio-visual speech recognition system for testing new audio-visual databases
    Pao, Tsang-Long
    Liao, Wen-Yuan
    VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
  • [38] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
    Zhang, Zi-Qiang
    Zhang, Jie
    Zhang, Jian-Shu
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
  • [39] Audio-Visual Speech Recognition in Noisy Audio Environments
    Palecek, Karel
    Chaloupka, Josef
    2013 36TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS AND SIGNAL PROCESSING (TSP), 2013, : 484 - 487
  • [40] DEEP AUDIO-VISUAL FUSION NEURAL NETWORK FOR SALIENCY ESTIMATION
    Yao, Shunyu
    Min, Xiongkuo
    Zhai, Guangtao
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 1604 - 1608