An automatic multimodal speech recognition system with audio and video information

被引：12

作者：

Karpov, A. A. ^{[1
,2
]}

机构：

[1] Russian Acad Sci, St Petersburg Inst Informat & Automat, St Petersburg 196140, Russia

[2] ITMO Univ, St Petersburg, Russia

来源：

AUTOMATION AND REMOTE CONTROL | 2014年 / 75卷 / 12期

基金：

俄罗斯基础研究基金会;

关键词：

Speech Recognition; Automatic Speech Recognition; Speech Recognition System; Speech Corpus; Automatic Speech Recognition System;

D O I：

10.1134/S000511791412008X

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The mathematical model and software implementation of an automatic Russian speech recognition system that employs techniques of digital processing and analysis of audiovisual signals from a microphone and a video camera are presented. The description of probabilistic modeling of audiovisual speech based on coupled hidden Markov models, information fusion methods with weight coefficients for audio and video speech modalities, and parametric representation of signals is provided. Quantitative results in multimodal recognition of continuous Russian speech indicate high accuracy and reliability of the automatic system.

引用

页码：2190 / 2200

页数：11

共 50 条

[31] Multimodal information fusion using the iterative decoding algorithm and its application to audio-visual speech recognition
Shivappa, Shankar T.
Rao, Bhaskar D.
Trivedi, Mohan M.
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 2241 - 2244
[32] Audio and Video-based Emotion Recognition using Multimodal Transformers
John, Vijay
Kawanishi, Yasutomo
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2582 - 2588
[33] Usefulness of glottal excitation source information for audio-visual speech recognition system
Nandakishor S.
Pati D.
International Journal of Speech Technology, 2023, 26 (04) : 933 - 945
[34] INFORMATION RETRIEVAL METHODS FOR AUTOMATIC SPEECH RECOGNITION
Xiao, Xiaoqiang
Droppo, Jasha
Acero, Alex
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 5550 - 5553
[35] Prosodic and accentual information for automatic speech recognition
Milone, DH
Rubio, AJ
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (04): : 321 - 333
[36] A multimodal emotion recognition model integrating speech, video and MoCAP
Jia, Ning
Zheng, Chunjun
Sun, Wei
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (22) : 32265 - 32286
[37] A multimodal emotion recognition model integrating speech, video and MoCAP
Ning Jia
Chunjun Zheng
Wei Sun
Multimedia Tools and Applications, 2022, 81 : 32265 - 32286
[38] The AhoSR Automatic Speech Recognition System
Odriozola, Igor
Serrano, Luis
Hernaez, Inma
Navas, Eva
ADVANCES IN SPEECH AND LANGUAGE TECHNOLOGIES FOR IBERIAN LANGUAGES, IBERSPEECH 2014, 2014, 8854 : 279 - 288
[39] AN AUTOMATIC SPEECH RECOGNITION SYSTEM TABARCA
BENEDI, JM
CASACUBERTA, F
VIDAL, E
REVISTA DE INFORMATICA Y AUTOMATICA, 1990, 23 (01): : 15 - 24
[40] Automatic Indexing Algorithm of Golf Video Using Audio Information
Kim, Hyoung-Gook
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2009, 28 (05): : 441 - 446

← 1 2 3 4 5 →