An automatic multimodal speech recognition system with audio and video information

被引:12
|
作者
Karpov, A. A. [1 ,2 ]
机构
[1] Russian Acad Sci, St Petersburg Inst Informat & Automat, St Petersburg 196140, Russia
[2] ITMO Univ, St Petersburg, Russia
基金
俄罗斯基础研究基金会;
关键词
Speech Recognition; Automatic Speech Recognition; Speech Recognition System; Speech Corpus; Automatic Speech Recognition System;
D O I
10.1134/S000511791412008X
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The mathematical model and software implementation of an automatic Russian speech recognition system that employs techniques of digital processing and analysis of audiovisual signals from a microphone and a video camera are presented. The description of probabilistic modeling of audiovisual speech based on coupled hidden Markov models, information fusion methods with weight coefficients for audio and video speech modalities, and parametric representation of signals is provided. Quantitative results in multimodal recognition of continuous Russian speech indicate high accuracy and reliability of the automatic system.
引用
收藏
页码:2190 / 2200
页数:11
相关论文
共 50 条
  • [41] Training of Automatic Speech Recognition System on Noised Speech
    Prodeus, Arkadiy
    Kukharicheva, Kateryna
    2016 4TH INTERNATIONAL CONFERENCE ON METHODS AND SYSTEMS OF NAVIGATION AND MOTION CONTROL (MSNMC), 2016, : 221 - 223
  • [42] REMOTE ACCESS AUDIO/VIDEO INFORMATION SYSTEM
    CROSSMAN, DM
    LIBRARY TRENDS, 1971, 19 (04) : 437 - &
  • [43] A Multimodal Emotion Recognition System from Video
    Thushara, S.
    Veni, S.
    PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2016), 2016,
  • [44] Multistage information fusion for audio-visual speech recognition
    Chu, SM
    Libal, V
    Marcheret, E
    Neti, C
    Potamianos, G
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1651 - 1654
  • [45] Information Fusion Techniques in Audio-Visual Speech Recognition
    Karabalkan, H.
    Erdogan, H.
    2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 734 - 737
  • [46] Towards Visualizing and Detecting Audio Adversarial Examples for Automatic Speech Recognition
    Zong, Wei
    Chow, Yang-Wai
    Susilo, Willy
    INFORMATION SECURITY AND PRIVACY, ACISP 2021, 2021, 13083 : 531 - 549
  • [47] Multimodal Corpus Design for Audio-Visual Speech Recognition in Vehicle Cabin
    Kashevnik, Alexey
    Lashkov, Igor
    Axyonov, Alexandr
    Ivanko, Denis
    Ryumin, Dmitry
    Kolchin, Artem
    Karpov, Alexey
    IEEE ACCESS, 2021, 9 : 34986 - 35003
  • [48] Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition
    Yuan, Yuan
    Tian, Chunlin
    Lu, Xiaoqiang
    IEEE ACCESS, 2018, 6 : 5573 - 5583
  • [49] Multimodal Integration for Large-Vocabulary Audio-Visual Speech Recognition
    Yu, Wentao
    Zeiler, Steffen
    Kolossa, Dorothea
    28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 341 - 345
  • [50] Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text
    Lee, Yoonhyung
    Yoon, Seunghyun
    Jung, Kyomin
    INTERSPEECH 2020, 2020, : 2717 - 2721