An automatic multimodal speech recognition system with audio and video information

被引:12
|
作者
Karpov, A. A. [1 ,2 ]
机构
[1] Russian Acad Sci, St Petersburg Inst Informat & Automat, St Petersburg 196140, Russia
[2] ITMO Univ, St Petersburg, Russia
基金
俄罗斯基础研究基金会;
关键词
Speech Recognition; Automatic Speech Recognition; Speech Recognition System; Speech Corpus; Automatic Speech Recognition System;
D O I
10.1134/S000511791412008X
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The mathematical model and software implementation of an automatic Russian speech recognition system that employs techniques of digital processing and analysis of audiovisual signals from a microphone and a video camera are presented. The description of probabilistic modeling of audiovisual speech based on coupled hidden Markov models, information fusion methods with weight coefficients for audio and video speech modalities, and parametric representation of signals is provided. Quantitative results in multimodal recognition of continuous Russian speech indicate high accuracy and reliability of the automatic system.
引用
收藏
页码:2190 / 2200
页数:11
相关论文
共 50 条
  • [31] Multimodal information fusion using the iterative decoding algorithm and its application to audio-visual speech recognition
    Shivappa, Shankar T.
    Rao, Bhaskar D.
    Trivedi, Mohan M.
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 2241 - 2244
  • [32] Audio and Video-based Emotion Recognition using Multimodal Transformers
    John, Vijay
    Kawanishi, Yasutomo
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2582 - 2588
  • [33] Usefulness of glottal excitation source information for audio-visual speech recognition system
    Nandakishor S.
    Pati D.
    International Journal of Speech Technology, 2023, 26 (04) : 933 - 945
  • [34] INFORMATION RETRIEVAL METHODS FOR AUTOMATIC SPEECH RECOGNITION
    Xiao, Xiaoqiang
    Droppo, Jasha
    Acero, Alex
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5550 - 5553
  • [35] Prosodic and accentual information for automatic speech recognition
    Milone, DH
    Rubio, AJ
    IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (04): : 321 - 333
  • [36] A multimodal emotion recognition model integrating speech, video and MoCAP
    Jia, Ning
    Zheng, Chunjun
    Sun, Wei
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (22) : 32265 - 32286
  • [37] A multimodal emotion recognition model integrating speech, video and MoCAP
    Ning Jia
    Chunjun Zheng
    Wei Sun
    Multimedia Tools and Applications, 2022, 81 : 32265 - 32286
  • [38] The AhoSR Automatic Speech Recognition System
    Odriozola, Igor
    Serrano, Luis
    Hernaez, Inma
    Navas, Eva
    ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 279 - 288
  • [39] AN AUTOMATIC SPEECH RECOGNITION SYSTEM TABARCA
    BENEDI, JM
    CASACUBERTA, F
    VIDAL, E
    REVISTA DE INFORMATICA Y AUTOMATICA, 1990, 23 (01): : 15 - 24
  • [40] Automatic Indexing Algorithm of Golf Video Using Audio Information
    Kim, Hyoung-Gook
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2009, 28 (05): : 441 - 446