JOINTLY RECOGNIZING MULTI-SPEAKER CONVERSATIONS

被引:1
|
作者
Ji, Gang [1 ]
Bilmes, Jeff [1 ]
机构
[1] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA
关键词
Speech recognition; multi-speaker; graphical models;
D O I
10.1109/ICASSP.2010.5495041
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We suggest an approach to speech recognition where multiple sides of a conversation in a dialog or meeting are processed and decoded jointly rather than independently. We moreover introduce a practical implementation of this approach that demonstrates both language model perplexity and speech recognition word error rate improvements in conversational telephone speech. Specifically, we show that such benefits can be had if a n-gram language model, in addition to conditioning on immediately preceding words in an utterance, is also allowed to condition on the estimated dialog-act of the immediately preceding utterance of an alternate speaker.
引用
收藏
页码:5110 / 5113
页数:4
相关论文
共 50 条
  • [41] MULTI-SCENARIO DEEP LEARNING FOR MULTI-SPEAKER SOURCE SEPARATION
    Zegers, Jeroen
    Van Hamme, Hugo
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5379 - 5383
  • [42] Enhancing Zero-Shot Multi-Speaker TTS with Negated Speaker Representations
    Jeon, Yejin
    Kim, Yunsu
    Lee, Gary Geunbae
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 16, 2024, : 18336 - 18344
  • [43] Single-speaker/multi-speaker co-channel speech classification
    Rossignol, Stephane
    Pietquini, Olivier
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2322 - 2325
  • [44] STREAMING MULTI-SPEAKER ASR WITH RNN-T
    Sklyar, Ilya
    Piunova, Anna
    Liu, Yulan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6903 - 6907
  • [45] Advances in multi-speaker conversational speech recognition and understanding
    Hori, Takaaki
    Araki, Shoko
    Nakatani, Tomohiro O.
    Nakamura, Atsushi
    NTT Technical Review, 2013, 11 (12):
  • [46] Can Speaker Augmentation Improve Multi-Speaker End-to-End TTS?
    Cooper, Erica
    Lai, Cheng-, I
    Yasuda, Yusuke
    Yamagishi, Junichi
    INTERSPEECH 2020, 2020, : 3979 - 3983
  • [47] Speaker detection using multi-speaker audio files for both enrollment and test
    Bonastre, JF
    Meignier, S
    Merlin, T
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 77 - 80
  • [48] THE MULTI-SPEAKER MULTI-STYLE VOICE CLONING CHALLENGE 2021
    Xie, Qicong
    Tian, Xiaohai
    Liu, Guanghou
    Song, Kun
    Xie, Lei
    Wu, Zhiyong
    Li, Hai
    Shi, Song
    Li, Haizhou
    Hong, Fen
    Bu, Hui
    Xu, Xin
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 8613 - 8617
  • [49] A Word-axis Speaker Embedding Trained with Multi-Speaker Analysis Task
    Gim, Jio
    Nam, Younho
    Kim, Hyo-Jin
    Suh, Young-Joo
    2024 FIFTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS, ICUFN 2024, 2024, : 321 - 323
  • [50] MULTI-SPEAKER TRACKING BY FUSING AUDIO AND VIDEO INFORMATION
    Xiong, Zichao
    Liu, Hongqing
    Zhou, Yi
    Luo, Zhen
    2021 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2021, : 321 - 325