Multimodal speaker segmentation in presence of overlapped speech segments

被引:2
|
作者
Rozgic, Viktor [1 ]
Han, Kyu Jeong [1 ]
Georgiou, Panayiotis G. [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ So Calif, Dept Elect Engn, Speech Anal & Interpretat Lab, Viterbi Sch Engn, Los Angeles, CA 90089 USA
关键词
D O I
10.1109/ISM.2008.103
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a multimodal speaker segmentation algorithm with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of the three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel method for dealing with overlapped speech segments through a likelihood model of the microphone array observations that uses multiple local maxima of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function in the Joint Probabilistic Data Association (JPDA) framework. Results show that the proposed method outperforms standard speaker segmentation systems based on: (a) speaker identification and; (b) microphone array processing, for datasets with the significant portion (27.4%) of overlapped speech, and scores as high as 94.4% on the F-measure scale.
引用
收藏
页码:679 / 684
页数:6
相关论文
共 50 条
  • [1] Multimodal speaker segmentation and identification in presence of overlapped speech segments
    Rozgić V.
    Han K.J.
    Georgiou P.G.
    Narayanan S.
    Journal of Multimedia, 2010, 5 (04): : 322 - 331
  • [2] Improved Overlapped Speech Handling for Speaker Diarization
    Boakye, Kofi
    Vinyals, Oriol
    Friedland, Gerald
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 948 - +
  • [3] Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers
    Kanda, Naoyuki
    Gaur, Yashesh
    Wang, Xiaofei
    Meng, Zhong
    Chen, Zhuo
    Zhou, Tianyan
    Yoshioka, Takuya
    INTERSPEECH 2020, 2020, : 36 - 40
  • [4] Overlapped Speech Detection and speaker counting using distant
    Cornell, Samuele
    Omologo, Maurizio
    Squartini, Stefano
    Vincent, Emmanuel
    COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [5] SELECTION OF FEATURES AND SPEECH SEGMENTS FOR SPEAKER VERIFICATION
    LIN, WC
    PILLAY, SK
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 58 : S107 - S107
  • [6] Overlapped speech detection for improved speaker diarization in multiparty meetings
    Boakye, Kofi
    Trueba-Hornero, Beatriz
    Vinyals, Oriol
    Friedland, Gerald
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4353 - 4356
  • [7] Robust Extraction of Desired Speaker's Utterance in Overlapped Speech
    Lu, Haoze
    Akaiwa, Yuma
    Horiuchi, Yasuo
    Kuroiwa, Shingo
    ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2016, 99 (11) : 80 - 89
  • [8] Robust extraction of desired speaker's utterance in overlapped speech
    Graduate School of Advanced Integration Science, Chiba University, 1-33, Yayoicho, Inage-ku, Chiba-shi, Chiba
    263-8522, Japan
    IEEJ Trans. Electron. Inf. Syst., 8 (1009-1016):
  • [9] Speech Enhancement for Multimodal Speaker Diarization System
    Ahmad, Rehan
    Zubair, Syed
    Alquhayz, Hani
    IEEE ACCESS, 2020, 8 : 126671 - 126680
  • [10] Multimodal Speaker Identification Based on Text and Speech
    Moschonas, Panagiotis
    Kotropoulos, Constantine
    BIOMETRICS AND IDENTITY MANAGEMENT, 2008, 5372 : 100 - 109