Multimodal speaker segmentation in presence of overlapped speech segments

被引:2
|
作者
Rozgic, Viktor [1 ]
Han, Kyu Jeong [1 ]
Georgiou, Panayiotis G. [1 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ So Calif, Dept Elect Engn, Speech Anal & Interpretat Lab, Viterbi Sch Engn, Los Angeles, CA 90089 USA
关键词
D O I
10.1109/ISM.2008.103
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a multimodal speaker segmentation algorithm with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of the three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel method for dealing with overlapped speech segments through a likelihood model of the microphone array observations that uses multiple local maxima of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function in the Joint Probabilistic Data Association (JPDA) framework. Results show that the proposed method outperforms standard speaker segmentation systems based on: (a) speaker identification and; (b) microphone array processing, for datasets with the significant portion (27.4%) of overlapped speech, and scores as high as 94.4% on the F-measure scale.
引用
收藏
页码:679 / 684
页数:6
相关论文
共 50 条
  • [21] Two's a Crowd: Improving Speaker Diarization by Automatically Identifying and Excluding Overlapped Speech
    Boakye, Kofi
    Vinyals, Oriol
    Friedland, Gerald
    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 32 - 35
  • [22] Entropy Based Overlapped Speech Detection as a Pre-Processing Stage for Speaker Diarization
    Ben-Harush, Oshry
    Lapidot, Itshak
    Guterman, Hugo
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 908 - +
  • [23] Overlapped Speech Detection and Competing Speaker Counting-Humans Versus Deep Learning
    Andrei, Valentin
    Cucu, Horia
    Burileanu, Corneliu
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 850 - 862
  • [24] Temporal variability in speech segments of Spanish:: context and speaker related differences
    Mendoza, E
    Carballo, G
    Cruz, A
    Fresneda, MD
    Muñoz, J
    Marrero, V
    SPEECH COMMUNICATION, 2003, 40 (04) : 431 - 447
  • [25] Speaker Adaptation Intensively Weighted on Mis-Recognized Speech Segments
    Oku, Takahiro
    Fujita, Yuya
    Kobayashi, Akio
    Imai, Toru
    2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
  • [26] Speaker segmentation and adaptation for speech recognition on multiple-speaker audio conference data
    Liu, Zhu
    Saraclar, Murat
    2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 192 - +
  • [27] Effect of High-Energy Voiced Speech Segments and Speaker Gender on Shouted Speech Detection
    Baghel, Shikha
    Prasanna, S. R. M.
    Guha, Prithwijit
    2021 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2021, : 53 - 58
  • [28] A pitch-based rapid speech segmentation for speaker indexing
    Yang, M
    Yang, YC
    Wu, ZH
    ISM 2005: SEVENTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2005, : 571 - 576
  • [29] Neural speech turn segmentation and affinity propagation for speaker diarization
    Yin, Ruiqing
    Bredin, Herve
    Barras, Claude
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1393 - 1397
  • [30] Visual Speech Segmentation and Speaker Recognition for Transcription of TV News
    Chaloupka, Josef
    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1284 - 1287