JOINTLY RECOGNIZING MULTI-SPEAKER CONVERSATIONS

被引:1
|
作者
Ji, Gang [1 ]
Bilmes, Jeff [1 ]
机构
[1] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA
关键词
Speech recognition; multi-speaker; graphical models;
D O I
10.1109/ICASSP.2010.5495041
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We suggest an approach to speech recognition where multiple sides of a conversation in a dialog or meeting are processed and decoded jointly rather than independently. We moreover introduce a practical implementation of this approach that demonstrates both language model perplexity and speech recognition word error rate improvements in conversational telephone speech. Specifically, we show that such benefits can be had if a n-gram language model, in addition to conditioning on immediately preceding words in an utterance, is also allowed to condition on the estimated dialog-act of the immediately preceding utterance of an alternate speaker.
引用
收藏
页码:5110 / 5113
页数:4
相关论文
共 50 条
  • [21] Multi-speaker experimental designs: Methodological considerations
    Offrede, Tom
    Fuchs, Susanne
    Mooshammer, Christine
    LANGUAGE AND LINGUISTICS COMPASS, 2021, 15 (12):
  • [22] INVESTIGATION OF FAST AND EFFICIENT METHODS FOR MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION
    Zheng, Yibin
    Li, Xinhui
    Lu, Li
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6618 - 6622
  • [23] Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion
    Aloradi, Ahmad
    Mack, Wolfgang
    Elminshawi, Mohamed
    Habets, EmanuM A. P.
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 354 - 358
  • [24] Unsupervised Speaker and Expression Factorization for Multi-Speaker Expressive Synthesis of Ebooks
    Chen, Langzhou
    Braunschweiler, Norbert
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 1041 - 1045
  • [25] Joint Syntax-Enhanced and Topic-Driven Graph Networks for Emotion Recognition in Multi-Speaker Conversations
    Yu, Hui
    Ma, Tinghuai
    Jia, Li
    Al-Nabhan, Najla
    Wahab, M. M. Abdel
    APPLIED SCIENCES-BASEL, 2023, 13 (06):
  • [26] Multi-Speaker Text-to-Speech Training With Speaker Anonymized Data
    Huang, Wen-Chin
    Wu, Yi-Chiao
    Toda, Tomoki
    IEEE SIGNAL PROCESSING LETTERS, 2024, 31 : 2995 - 2999
  • [27] ForumSum: A Multi-Speaker Conversation Summarization Dataset
    Khalman, Misha
    Zhao, Yao
    Saleh, Mohammad
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 4592 - 4599
  • [28] Keyword-based speaker localization: Localizing a target speaker in a multi-speaker environment
    Sivasankaran, Sunit
    Vincent, Emmanuel
    Fohr, Dominique
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 2703 - 2707
  • [29] Multi-channel multi-speaker transformer for speech recognition
    Guo Yifan
    Tian Yao
    Suo Hongbin
    Wan Yulong
    INTERSPEECH 2023, 2023, : 4918 - 4922
  • [30] Training Speaker Embedding Extractors Using Multi-Speaker Audio with Unknown Speaker Boundaries
    Stafylakis, Themos
    Mosner, Ladislav
    Plchot, Oldrich
    Rohdin, Johan
    Silnova, Anna
    Burget, Lukas
    Cernocky, Jan Honza
    INTERSPEECH 2022, 2022, : 605 - 609