An automatic multimodal speech recognition system with audio and video information

被引：12

作者：

Karpov, A. A. ^{[1
,2
]}

机构：

[1] Russian Acad Sci, St Petersburg Inst Informat & Automat, St Petersburg 196140, Russia

[2] ITMO Univ, St Petersburg, Russia

来源：

AUTOMATION AND REMOTE CONTROL | 2014年 / 75卷 / 12期

基金：

俄罗斯基础研究基金会;

关键词：

Speech Recognition; Automatic Speech Recognition; Speech Recognition System; Speech Corpus; Automatic Speech Recognition System;

D O I：

10.1134/S000511791412008X

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The mathematical model and software implementation of an automatic Russian speech recognition system that employs techniques of digital processing and analysis of audiovisual signals from a microphone and a video camera are presented. The description of probabilistic modeling of audiovisual speech based on coupled hidden Markov models, information fusion methods with weight coefficients for audio and video speech modalities, and parametric representation of signals is provided. Quantitative results in multimodal recognition of continuous Russian speech indicate high accuracy and reliability of the automatic system.

引用

页码：2190 / 2200

页数：11

共 50 条

[41] Training of Automatic Speech Recognition System on Noised Speech
Prodeus, Arkadiy
Kukharicheva, Kateryna
2016 4TH INTERNATIONAL CONFERENCE ON METHODS AND SYSTEMS OF NAVIGATION AND MOTION CONTROL (MSNMC), 2016, : 221 - 223
[42] REMOTE ACCESS AUDIO/VIDEO INFORMATION SYSTEM
CROSSMAN, DM
LIBRARY TRENDS, 1971, 19 (04) : 437 - &
[43] A Multimodal Emotion Recognition System from Video
Thushara, S.
Veni, S.
PROCEEDINGS OF IEEE INTERNATIONAL CONFERENCE ON CIRCUIT, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2016), 2016,
[44] Multistage information fusion for audio-visual speech recognition
Chu, SM
Libal, V
Marcheret, E
Neti, C
Potamianos, G
2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1651 - 1654
[45] Information Fusion Techniques in Audio-Visual Speech Recognition
Karabalkan, H.
Erdogan, H.
2009 IEEE 17TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2009, : 734 - 737
[46] Towards Visualizing and Detecting Audio Adversarial Examples for Automatic Speech Recognition
Zong, Wei
Chow, Yang-Wai
Susilo, Willy
INFORMATION SECURITY AND PRIVACY, ACISP 2021, 2021, 13083 : 531 - 549
[47] Multimodal Corpus Design for Audio-Visual Speech Recognition in Vehicle Cabin
Kashevnik, Alexey
Lashkov, Igor
Axyonov, Alexandr
Ivanko, Denis
Ryumin, Dmitry
Kolchin, Artem
Karpov, Alexey
IEEE ACCESS, 2021, 9 : 34986 - 35003
[48] Auxiliary Loss Multimodal GRU Model in Audio-Visual Speech Recognition
Yuan, Yuan
Tian, Chunlin
Lu, Xiaoqiang
IEEE ACCESS, 2018, 6 : 5573 - 5583
[49] Multimodal Integration for Large-Vocabulary Audio-Visual Speech Recognition
Yu, Wentao
Zeiler, Steffen
Kolossa, Dorothea
28TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2020), 2021, : 341 - 345
[50] Multimodal Speech Emotion Recognition using Cross Attention with Aligned Audio and Text
Lee, Yoonhyung
Yoon, Seunghyun
Jung, Kyomin
INTERSPEECH 2020, 2020, : 2717 - 2721

← 1 2 3 4 5 →