Distinctive feature fusion for improved audio-visual phoneme recognition

被引:0
|
作者
Lewis, T [1 ]
Powers, D [1 ]
机构
[1] Flinders Univ S Australia, Sch Informat & Engn, Adelaide, SA 5001, Australia
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Auditory and visual signals provide complementary information but few applications successfully combine the two sources. We consider a distinctive feature approach to Audio Visual Automatic Speech Recognition (AV-ASR) in which features appropriate to each modality are employed, and demonstrate that in the absence of knowledge about the noise the modality-specific approach is best. However even information from the non-preferred modality can be usefully employed if the environmental context (e.g. SNR) is accounted for by adaptively weighting each modality. Future research is focusing on deriving these distinctive feature automatically from data rather than using those proposed by linguists.
引用
收藏
页码:62 / 65
页数:4
相关论文
共 50 条
  • [41] Audio-visual phoneme classification for pronunciation training applications
    Kjellstrom, Hedvig
    Engwall, Olov
    Abdou, Sherif
    Balter, Olle
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 57 - +
  • [42] Audio-visual affect recognition
    Zeng, Zhihong
    Tu, Jilin
    Liu, Ming
    Huang, Thomas S.
    Pianfetti, Brian
    Roth, Dan
    Levinson, Stephen
    IEEE TRANSACTIONS ON MULTIMEDIA, 2007, 9 (02) : 424 - 428
  • [43] Audio-visual gender recognition
    Liu, Ming
    Xu, Xun
    Huang, Thomas S.
    MIPPR 2007: PATTERN RECOGNITION AND COMPUTER VISION, 2007, 6788
  • [44] A Robust Feature Extraction with Dual Fusion aided Extreme Learning for Audio-Visual Hindi Speech Recognition
    Sharma, Usha
    Om, Hari
    Mishra, A. N.
    JOURNAL OF SCIENTIFIC & INDUSTRIAL RESEARCH, 2020, 79 (05): : 383 - 386
  • [45] ROBUST MULTIMODAL PERSON RECOGNITION USING LOW-COMPLEXITY AUDIO-VISUAL FEATURE FUSION APPROACHES
    Shah, Dhaval
    Han, Kyu J.
    Narayanan, Shrikanth S.
    INTERNATIONAL JOURNAL OF SEMANTIC COMPUTING, 2010, 4 (02) : 155 - 179
  • [46] An audio-visual sensor fusion approach for feature based vehicle identification
    Klausner, Andreas
    Tengg, Allan
    Leistner, Christian
    Erb, Stefan
    Rinner, Bernhard
    2007 IEEE CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE, 2007, : 111 - 116
  • [47] An audio-visual speech recognition system for testing new audio-visual databases
    Pao, Tsang-Long
    Liao, Wen-Yuan
    VISAPP 2006: PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON COMPUTER VISION THEORY AND APPLICATIONS, VOL 2, 2006, : 192 - +
  • [48] LEARNING CONTEXTUALLY FUSED AUDIO-VISUAL REPRESENTATIONS FOR AUDIO-VISUAL SPEECH RECOGNITION
    Zhang, Zi-Qiang
    Zhang, Jie
    Zhang, Jian-Shu
    Wu, Ming-Hui
    Fang, Xin
    Dai, Li-Rong
    2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, : 1346 - 1350
  • [49] Optimum integration weight for decision fusion audio-visual speech recognition
    Rajavel, R.
    Sathidevi, P. S.
    INTERNATIONAL JOURNAL OF COMPUTATIONAL SCIENCE AND ENGINEERING, 2015, 10 (1-2) : 145 - 154
  • [50] Performance Improvement of Audio-Visual Speech Recognition with Optimal Reliability Fusion
    Tariquzzaman, Md
    Gyu, Song Min
    Young, Kim Jin
    You, Na Seung
    Rashid, M. A.
    2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL III, 2010, : 216 - 219