JOINTLY RECOGNIZING MULTI-SPEAKER CONVERSATIONS

被引：1

作者：

Ji, Gang ^{[1
]}

Bilmes, Jeff ^{[1
]}

机构：

[1] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA

来源：

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年

关键词：

Speech recognition; multi-speaker; graphical models;

D O I：

10.1109/ICASSP.2010.5495041

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We suggest an approach to speech recognition where multiple sides of a conversation in a dialog or meeting are processed and decoded jointly rather than independently. We moreover introduce a practical implementation of this approach that demonstrates both language model perplexity and speech recognition word error rate improvements in conversational telephone speech. Specifically, we show that such benefits can be had if a n-gram language model, in addition to conditioning on immediately preceding words in an utterance, is also allowed to condition on the estimated dialog-act of the immediately preceding utterance of an alternate speaker.

引用

页码：5110 / 5113

页数：4

共 50 条

[31] Improving Source Separation via Multi-Speaker Representations
Zegers, Jeroen
Van Hamme, Hugo
18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1919 - 1923
[32] Toward a multi-speaker visual articulatory feedback system
Ben Youssef, Atef
Hueber, Thomas
Badin, Pierre
Badilly, Gerard
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 596 - 599
[33] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
Settle, Shane
Le Roux, Jonathan
Hori, Takaaki
Watanabe, Shinji
Hershey, John R.
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
[34] TIME DELAY DISTORTION IN MULTI-SPEAKER LOUDSPEAKER SYSTEMS
GERSTEN, M
JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 1970, 18 (03): : 333 - &
[35] TOWARDS MULTI-SPEAKER UNSUPERVISED SPEECH PATTERN DISCOVERY
Zhang, Yaodong
Glass, James R.
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4366 - 4369
[36] KMSAV: Korean multi-speaker spontaneous audiovisual dataset
Park, Kiyoung
Oh, Changhan
Dong, Sunghee
ETRI JOURNAL, 2024, 46 (01) : 71 - 81
[37] Research on ASIC for multi-speaker isolated word recognition
Xiong, B
Sun, YH
1996 2ND INTERNATIONAL CONFERENCE ON ASIC, PROCEEDINGS, 1996, : 135 - 137
[38] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
Fan, Yuchen
Qian, Yao
Soong, Frank K.
He, Lei
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
[39] Integration of audio-visual information for multi-speaker multimedia speaker recognition
Yang, Jichen
Chen, Fangfan
Cheng, Yu
Lin, Pei
DIGITAL SIGNAL PROCESSING, 2024, 145
[40] MULTI-SPEAKER, NARROWBAND, CONTINUOUS MARATHI SPEECH DATABASE
Godambe, Tejas
Bondale, Nandini
Samudravijaya, K.
Rao, Preeti
2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,

← 1 2 3 4 5 →