JOINTLY RECOGNIZING MULTI-SPEAKER CONVERSATIONS

被引：1

作者：

Ji, Gang ^{[1
]}

Bilmes, Jeff ^{[1
]}

机构：

[1] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA

来源：

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2010年

关键词：

Speech recognition; multi-speaker; graphical models;

D O I：

10.1109/ICASSP.2010.5495041

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We suggest an approach to speech recognition where multiple sides of a conversation in a dialog or meeting are processed and decoded jointly rather than independently. We moreover introduce a practical implementation of this approach that demonstrates both language model perplexity and speech recognition word error rate improvements in conversational telephone speech. Specifically, we show that such benefits can be had if a n-gram language model, in addition to conditioning on immediately preceding words in an utterance, is also allowed to condition on the estimated dialog-act of the immediately preceding utterance of an alternate speaker.

引用

页码：5110 / 5113

页数：4

共 50 条

[21] Multi-speaker experimental designs: Methodological considerations
Offrede, Tom
Fuchs, Susanne
Mooshammer, Christine
LANGUAGE AND LINGUISTICS COMPASS, 2021, 15 (12):
[22] INVESTIGATION OF FAST AND EFFICIENT METHODS FOR MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION
Zheng, Yibin
Li, Xinhui
Lu, Li
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6618 - 6622
[23] Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion
Aloradi, Ahmad
Mack, Wolfgang
Elminshawi, Mohamed
Habets, EmanuM A. P.
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 354 - 358
[24] Unsupervised Speaker and Expression Factorization for Multi-Speaker Expressive Synthesis of Ebooks
Chen, Langzhou
Braunschweiler, Norbert
14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1041 - 1045
[25] Joint Syntax-Enhanced and Topic-Driven Graph Networks for Emotion Recognition in Multi-Speaker Conversations
Yu, Hui
Ma, Tinghuai
Jia, Li
Al-Nabhan, Najla
Wahab, M. M. Abdel
APPLIED SCIENCES-BASEL, 2023, 13 (06):
[26] Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data
Huang, Wen-Chin
Wu, Yi-Chiao
Toda, Tomoki
IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2995 - 2999
[27] ForumSum: A Multi-Speaker Conversation Summarization Dataset
Khalman, Misha
Zhao, Yao
Saleh, Mohammad
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4592 - 4599
[28] Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment
Sivasankaran, Sunit
Vincent, Emmanuel
Fohr, Dominique
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2703 - 2707
[29] Multi-channel multi-speaker transformer for speech recognition
Guo Yifan
Tian Yao
Suo Hongbin
Wan Yulong
INTERSPEECH 2023, 2023, : 4918 - 4922
[30] Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries
Stafylakis, Themos
Mosner, Ladislav
Plchot, Oldrich
Rohdin, Johan
Silnova, Anna
Burget, Lukas
Cernocky, Jan Honza
INTERSPEECH 2022, 2022, : 605 - 609

← 1 2 3 4 5 →