An automatic multimodal speech recognition system with audio and video information

被引:12
|
作者
Karpov, A. A. [1 ,2 ]
机构
[1] Russian Acad Sci, St Petersburg Inst Informat & Automat, St Petersburg 196140, Russia
[2] ITMO Univ, St Petersburg, Russia
基金
俄罗斯基础研究基金会;
关键词
Speech Recognition; Automatic Speech Recognition; Speech Recognition System; Speech Corpus; Automatic Speech Recognition System;
D O I
10.1134/S000511791412008X
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The mathematical model and software implementation of an automatic Russian speech recognition system that employs techniques of digital processing and analysis of audiovisual signals from a microphone and a video camera are presented. The description of probabilistic modeling of audiovisual speech based on coupled hidden Markov models, information fusion methods with weight coefficients for audio and video speech modalities, and parametric representation of signals is provided. Quantitative results in multimodal recognition of continuous Russian speech indicate high accuracy and reliability of the automatic system.
引用
收藏
页码:2190 / 2200
页数:11
相关论文
共 50 条
  • [1] An automatic multimodal speech recognition system with audio and video information
    A. A. Karpov
    Automation and Remote Control, 2014, 75 : 2190 - 2200
  • [2] An audio-visual corpus for multimodal automatic speech recognition
    Andrzej Czyzewski
    Bozena Kostek
    Piotr Bratoszewski
    Jozef Kotus
    Marcin Szykulski
    Journal of Intelligent Information Systems, 2017, 49 : 167 - 192
  • [3] An audio-visual corpus for multimodal automatic speech recognition
    Czyzewski, Andrzej
    Kostek, Bozena
    Bratoszewski, Piotr
    Kotus, Jozef
    Szykulski, Marcin
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2017, 49 (02) : 167 - 192
  • [4] Indonesian Audio-Visual Speech Corpus for Multimodal Automatic Speech Recognition
    Maulana, Muhammad Rizki Aulia Rahman
    Fanany, Mohamad Ivan
    2017 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2017, : 381 - 385
  • [5] AUTOMATIC RECOGNITION OF SPEECH WITHOUT ANY AUDIO INFORMATION
    Heracleous, Panikos
    Hagita, Norihiro
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 2392 - 2395
  • [6] Automatic Speech Recognition System for Lithuanian Broadcast Audio
    Alumae, Tanel
    Tilk, Ottokar
    HUMAN LANGUAGE TECHNOLOGIES - THE BALTIC PERSPECTIVE, 2016, 289 : 39 - 45
  • [7] MULTIMODAL SPEECH EMOTION RECOGNITION USING AUDIO AND TEXT
    Yoon, Seunghyun
    Byun, Seokhyun
    Jung, Kyomin
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 112 - 118
  • [8] Multimodal English corpus for automatic speech recognition
    Kunka, Bartosz
    Kupryjanow, Adam
    Dalka, Piotr
    Bratoszewski, Piotr
    Szczodrak, Maciej
    Spaleniak, Pawel
    Szykulski, Marcin
    Czyzewski, Andrzej
    2013 SIGNAL PROCESSING: ALGORITHMS, ARCHITECTURES, ARRANGEMENTS, AND APPLICATIONS (SPA), 2013, : 106 - 111
  • [9] Enhancing Quality and Accuracy of Speech Recognition System by Using Multimodal Audio-Visual Speech signal
    El Maghraby, Eslam E.
    Gody, Amr M.
    Farouk, M. Hesham
    ICENCO 2016 - 2016 12TH INTERNATIONAL COMPUTER ENGINEERING CONFERENCE (ICENCO) - BOUNDLESS SMART SOCIETIES, 2016, : 219 - 229
  • [10] Audio-Visual (Multimodal) Speech Recognition System Using Deep Neural Network
    Paulin, Hebsibah
    Milton, R. S.
    JanakiRaman, S.
    Chandraprabha, K.
    JOURNAL OF TESTING AND EVALUATION, 2019, 47 (06) : 3963 - 3974