JOINTLY RECOGNIZING MULTI-SPEAKER CONVERSATIONS

被引：1

作者：

Ji, Gang ^{[1
]}

Bilmes, Jeff ^{[1
]}

机构：

[1] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA

来源：

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年

关键词：

Speech recognition; multi-speaker; graphical models;

D O I：

10.1109/ICASSP.2010.5495041

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We suggest an approach to speech recognition where multiple sides of a conversation in a dialog or meeting are processed and decoded jointly rather than independently. We moreover introduce a practical implementation of this approach that demonstrates both language model perplexity and speech recognition word error rate improvements in conversational telephone speech. Specifically, we show that such benefits can be had if a n-gram language model, in addition to conditioning on immediately preceding words in an utterance, is also allowed to condition on the estimated dialog-act of the immediately preceding utterance of an alternate speaker.

引用

页码：5110 / 5113

页数：4

共 50 条

[41] MULTI-SCENARIO DEEP LEARNING FOR MULTI-SPEAKER SOURCE SEPARATION
Zegers, Jeroen
Van Hamme, Hugo
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5379 - 5383
[42] Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
Jeon, Yejin
Kim, Yunsu
Lee, Gary Geunbae
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18336 - 18344
[43] Single-speaker/multi-speaker co-channel speech classification
Rossignol, Stephane
Pietquini, Olivier
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2322 - 2325
[44] STREAMING MULTI-SPEAKER ASR WITH RNN-T
Sklyar, Ilya
Piunova, Anna
Liu, Yulan
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6903 - 6907
[45] Advances in multi-speaker conversational speech recognition and understanding
Hori, Takaaki
Araki, Shoko
Nakatani, Tomohiro O.
Nakamura, Atsushi
NTT Technical Review, 2013, 11 (12):
[46] Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
Cooper, Erica
Lai, Cheng-, I
Yasuda, Yusuke
Yamagishi, Junichi
INTERSPEECH 2020, 2020, : 3979 - 3983
[47] Speaker detection using multi-speaker audio files for both enrollment and test
Bonastre, JF
Meignier, S
Merlin, T
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 77 - 80
[48] THE MULTI-SPEAKER MULTI-STYLE VOICE CLONING CHALLENGE 2021
Xie, Qicong
Tian, Xiaohai
Liu, Guanghou
Song, Kun
Xie, Lei
Wu, Zhiyong
Li, Hai
Shi, Song
Li, Haizhou
Hong, Fen
Bu, Hui
Xu, Xin
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8613 - 8617
[49] A Word-axis Speaker Embedding Trained with Multi-Speaker Analysis Task
Gim, Jio
Nam, Younho
Kim, Hyo-Jin
Suh, Young-Joo
2024 FIFTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS, ICUFN 2024, 2024, : 321 - 323
[50] MULTI-SPEAKER TRACKING BY FUSING AUDIO AND VIDEO INFORMATION
Xiong, Zichao
Liu, Hongqing
Zhou, Yi
Luo, Zhen
2021 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2021, : 321 - 325

← 1 2 3 4 5 →