Fuzzy-Neural-Network Based Audio-Visual Fusion for Speech Recognition

被引:0
|
作者
Wu, Gin-Der [1 ]
Tsai, Hao-Shu [1 ]
机构
[1] Natl Chi Nan Univ, Dept Elect Engn, Nantou, Taiwan
关键词
speech recognition; classification; type-2 fuzzy sets; linear-discriminant-analysis; discriminability;
D O I
10.1109/icaiic.2019.8669019
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech recognition is an important classification problem in signal processing. Its performance is easily affected by noisy environment due to movements of desks, door slams, etc. To solve the problem, a fuzzy-neural-network based audio-visual fusion is proposed in this study. Since human speech perception is bimodal, the input features include both audio and image information. In the fuzzy-neural-network, type-2 fuzzy sets are used in the antecedent parts to deal with the noisy data. Furthermore, a linear-discriminant-analysis (LDA) is applied in to the consequent parts to increase the "discriminability". Compared with pure audio-based speech recognition, the fuzzy-neural-network based audio-visual fusion method is more robust in noisy environment.
引用
收藏
页码:210 / 214
页数:5
相关论文
共 50 条
  • [21] Deep Audio-Visual Speech Recognition
    Afouras, Triantafyllos
    Chung, Joon Son
    Senior, Andrew
    Vinyals, Oriol
    Zisserman, Andrew
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 8717 - 8727
  • [22] Audio-visual integration for speech recognition
    Kober, R
    Harz, U
    NEUROLOGY PSYCHIATRY AND BRAIN RESEARCH, 1996, 4 (04) : 179 - 184
  • [23] AUDIO-VISUAL FUSION AND CONDITIONING WITH NEURAL NETWORKS FOR EVENT RECOGNITION
    Brousmiche, Mathilde
    Rouat, Jean
    Dupont, Stephane
    2019 IEEE 29TH INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2019,
  • [24] MULTIPOSE AUDIO-VISUAL SPEECH RECOGNITION
    Estellers, Virginia
    Thiran, Jean-Philippe
    19TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO-2011), 2011, : 1065 - 1069
  • [25] Audio-visual speech recognition by speechreading
    Zhang, XZ
    Mersereau, RM
    Clements, MA
    DSP 2002: 14TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING PROCEEDINGS, VOLS 1 AND 2, 2002, : 1069 - 1072
  • [26] Multimodal Sparse Transformer Network for Audio-Visual Speech Recognition
    Song, Qiya
    Sun, Bin
    Li, Shutao
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (12) : 10028 - 10038
  • [27] Attentive Fusion Enhanced Audio-Visual Encoding for Transformer Based Robust Speech Recognition
    Wei, Liangfa
    Zhang, Jie
    Hou, Junfeng
    Dai, Lirong
    2020 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2020, : 638 - 643
  • [28] A LIP GEOMETRY APPROACH FOR FEATURE-FUSION BASED AUDIO-VISUAL SPEECH RECOGNITION
    Ibrahim, M. Z.
    Mulvaney, D. J.
    2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, : 644 - 647
  • [29] Optimum integration weight for decision fusion audio-visual speech recognition
    Rajavel, R.
    Sathidevi, P. S.
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2015, 10 (1-2) : 145 - 154
  • [30] Fish behavior recognition based on an audio-visual multimodal interactive fusion network
    Yang, Yuxin
    Yu, Hong
    Zhang, Xin
    Zhang, Peng
    Tu, Wan
    Gu, Lishuai
    AQUACULTURAL ENGINEERING, 2024, 107