Distinctive feature fusion for improved audio-visual phoneme recognition

被引:0
|
作者
Lewis, T [1 ]
Powers, D [1 ]
机构
[1] Flinders Univ S Australia, Sch Informat & Engn, Adelaide, SA 5001, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Auditory and visual signals provide complementary information but few applications successfully combine the two sources. We consider a distinctive feature approach to Audio Visual Automatic Speech Recognition (AV-ASR) in which features appropriate to each modality are employed, and demonstrate that in the absence of knowledge about the noise the modality-specific approach is best. However even information from the non-preferred modality can be usefully employed if the environmental context (e.g. SNR) is accounted for by adaptively weighting each modality. Future research is focusing on deriving these distinctive feature automatically from data rather than using those proposed by linguists.
引用
收藏
页码:62 / 65
页数:4
相关论文
共 50 条
  • [31] Multimodal Attentive Fusion Network for audio-visual event recognition
    Brousmiche, Mathilde
    Rouat, Jean
    Dupont, Stephane
    INFORMATION FUSION, 2022, 85 : 52 - 59
  • [32] Audio-visual recognition system with intra-modal fusion
    Wong, Yee Wan
    Seng, Kah Phooi
    Ang, Li-Minn
    Khor, Wan Yong
    Liau, Heng Fui
    CIS: 2007 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY, PROCEEDINGS, 2007, : 609 - 613
  • [33] Incremental Audio-Visual Fusion for Person Recognition in Earthquake Scene
    You, Sisi
    Zuo, Yukun
    Yao, Hantao
    Xu, Changsheng
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2024, 20 (02)
  • [34] Semantic audio-visual data fusion for automatic emotion recognition
    Datcu, Dragos
    Rothkrantz, Leon J. M.
    EUROMEDIA '2008, 2008, : 58 - 65
  • [35] Audio-Visual Action Recognition Using Transformer Fusion Network
    Kim, Jun-Hwa
    Won, Chee Sun
    APPLIED SCIENCES-BASEL, 2024, 14 (03):
  • [36] Cross Attentional Audio-Visual Fusion for Dimensional Emotion Recognition
    Praveen, R. Gnana
    Granger, Eric
    Cardinal, Patrick
    2021 16TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2021), 2021,
  • [37] Robust Audio-Visual Speech Recognition Based on Hybrid Fusion
    Liu, Hong
    Li, Wenhao
    Yang, Bing
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7580 - 7586
  • [38] An audio-visual speech recognition with a new mandarin audio-visual database
    Liao, Wen-Yuan
    Pao, Tsang-Long
    Chen, Yu-Te
    Chang, Tsun-Wei
    INT CONF ON CYBERNETICS AND INFORMATION TECHNOLOGIES, SYSTEMS AND APPLICATIONS/INT CONF ON COMPUTING, COMMUNICATIONS AND CONTROL TECHNOLOGIES, VOL 1, 2007, : 19 - +
  • [39] Improved speech recognition using adaptive audio-visual fusion via a stochastic secondary classifier
    Lucey, S
    Sridharan, S
    Chandran, V
    PROCEEDINGS OF 2001 INTERNATIONAL SYMPOSIUM ON INTELLIGENT MULTIMEDIA, VIDEO AND SPEECH PROCESSING, 2001, : 551 - 554
  • [40] Audio-Visual Fusion for Sound Source Localization and Improved Attention
    Lee, Byoung-gi
    Choi, JongSuk
    Yoon, SangSuk
    Choi, Mun-Taek
    Kim, Munsang
    Kim, Daijin
    TRANSACTIONS OF THE KOREAN SOCIETY OF MECHANICAL ENGINEERS A, 2011, 35 (07) : 737 - 743