JOINTLY RECOGNIZING MULTI-SPEAKER CONVERSATIONS

被引:1
|
作者
Ji, Gang [1 ]
Bilmes, Jeff [1 ]
机构
[1] Univ Washington, Dept Elect Engn, Seattle, WA 98195 USA
关键词
Speech recognition; multi-speaker; graphical models;
D O I
10.1109/ICASSP.2010.5495041
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We suggest an approach to speech recognition where multiple sides of a conversation in a dialog or meeting are processed and decoded jointly rather than independently. We moreover introduce a practical implementation of this approach that demonstrates both language model perplexity and speech recognition word error rate improvements in conversational telephone speech. Specifically, we show that such benefits can be had if a n-gram language model, in addition to conditioning on immediately preceding words in an utterance, is also allowed to condition on the estimated dialog-act of the immediately preceding utterance of an alternate speaker.
引用
收藏
页码:5110 / 5113
页数:4
相关论文
共 50 条
  • [31] Improving Source Separation via Multi-Speaker Representations
    Zegers, Jeroen
    Van Hamme, Hugo
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 1919 - 1923
  • [32] Toward a multi-speaker visual articulatory feedback system
    Ben Youssef, Atef
    Hueber, Thomas
    Badin, Pierre
    Badilly, Gerard
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 596 - 599
  • [33] END-TO-END MULTI-SPEAKER SPEECH RECOGNITION
    Settle, Shane
    Le Roux, Jonathan
    Hori, Takaaki
    Watanabe, Shinji
    Hershey, John R.
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4819 - 4823
  • [34] TIME DELAY DISTORTION IN MULTI-SPEAKER LOUDSPEAKER SYSTEMS
    GERSTEN, M
    JOURNAL OF THE AUDIO ENGINEERING SOCIETY, 1970, 18 (03): : 333 - &
  • [35] TOWARDS MULTI-SPEAKER UNSUPERVISED SPEECH PATTERN DISCOVERY
    Zhang, Yaodong
    Glass, James R.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4366 - 4369
  • [36] KMSAV: Korean multi-speaker spontaneous audiovisual dataset
    Park, Kiyoung
    Oh, Changhan
    Dong, Sunghee
    ETRI JOURNAL, 2024, 46 (01) : 71 - 81
  • [37] Research on ASIC for multi-speaker isolated word recognition
    Xiong, B
    Sun, YH
    1996 2ND INTERNATIONAL CONFERENCE ON ASIC, PROCEEDINGS, 1996, : 135 - 137
  • [38] MULTI-SPEAKER MODELING AND SPEAKER ADAPTATION FOR DNN-BASED TTS SYNTHESIS
    Fan, Yuchen
    Qian, Yao
    Soong, Frank K.
    He, Lei
    2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4475 - 4479
  • [39] Integration of audio-visual information for multi-speaker multimedia speaker recognition
    Yang, Jichen
    Chen, Fangfan
    Cheng, Yu
    Lin, Pei
    DIGITAL SIGNAL PROCESSING, 2024, 145
  • [40] MULTI-SPEAKER, NARROWBAND, CONTINUOUS MARATHI SPEECH DATABASE
    Godambe, Tejas
    Bondale, Nandini
    Samudravijaya, K.
    Rao, Preeti
    2013 INTERNATIONAL CONFERENCE ORIENTAL COCOSDA HELD JOINTLY WITH 2013 CONFERENCE ON ASIAN SPOKEN LANGUAGE RESEARCH AND EVALUATION (O-COCOSDA/CASLRE), 2013,