Distinctive feature fusion for improved audio-visual phoneme recognition

被引:0
|
作者
Lewis, T [1 ]
Powers, D [1 ]
机构
[1] Flinders Univ S Australia, Sch Informat & Engn, Adelaide, SA 5001, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Auditory and visual signals provide complementary information but few applications successfully combine the two sources. We consider a distinctive feature approach to Audio Visual Automatic Speech Recognition (AV-ASR) in which features appropriate to each modality are employed, and demonstrate that in the absence of knowledge about the noise the modality-specific approach is best. However even information from the non-preferred modality can be usefully employed if the environmental context (e.g. SNR) is accounted for by adaptively weighting each modality. Future research is focusing on deriving these distinctive feature automatically from data rather than using those proposed by linguists.
引用
收藏
页码:62 / 65
页数:4
相关论文
共 50 条
  • [1] Continuous Phoneme Recognition based on Audio-Visual Modality Fusion
    Richter, Julius
    Liebold, Jeanine
    Gerkamnn, Timo
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [2] Audio-Visual Speech Recognition Based on AAM Parameter and Phoneme Analysis of Visual Feature
    Komai, Yuto
    Ariki, Yasuo
    Takiguchi, Tetsuya
    ADVANCES IN IMAGE AND VIDEO TECHNOLOGY, PT I, 2011, 7087 : 97 - 108
  • [3] Audio-Visual Domain Adaptation Feature Fusion for Speech Emotion Recognition
    Wei, Jie
    Hu, Guanyu
    Yang, Xinyu
    Luu, Anh Tuan
    Dong, Yizhuo
    INTERSPEECH 2022, 2022, : 1988 - 1992
  • [4] Audio-Visual Feature Fusion for Speaker Identification
    Almaadeed, Noor
    Aggoun, Amar
    Amira, Abbes
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT I, 2012, 7663 : 56 - 67
  • [5] End-to-End Bloody Video Recognition by Audio-Visual Feature Fusion
    Hou, Congcong
    Wu, Xiaoyu
    Wang, Ge
    PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 : 501 - 510
  • [6] Feature and Decision Level Audio-visual Data Fusion in Emotion Recognition Problem
    Sidorov, Maxim
    Sopov, Evgenii
    Ivanov, Ilia
    Minker, Wolfgang
    ICIMCO 2015 PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL. 2, 2015, : 246 - 251
  • [7] Scene recognition with audio-visual sensor fusion
    Devicharan, D
    Mehrotra, KG
    Mohan, CK
    Varshney, PK
    Zuo, L
    Multisensor, Multisource Information Fusion: Architectures, Algorithms and Applications 2005, 2005, 5813 : 201 - 210
  • [8] Multifactor fusion for audio-visual speaker recognition
    Chetty, Girija
    Tran, Dat
    LECTURE NOTES IN SIGNAL SCIENCE, INTERNET AND EDUCATION (SSIP'07/MIV'07/DIWEB'07), 2007, : 70 - +
  • [9] Bimodal fusion in audio-visual speech recognition
    Zhang, XZ
    Mersereau, RM
    Clements, M
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL I, PROCEEDINGS, 2002, : 964 - 967
  • [10] A LIP GEOMETRY APPROACH FOR FEATURE-FUSION BASED AUDIO-VISUAL SPEECH RECOGNITION
    Ibrahim, M. Z.
    Mulvaney, D. J.
    2014 6TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS, CONTROL AND SIGNAL PROCESSING (ISCCSP), 2014, : 644 - 647