Multimodal speaker segmentation in presence of overlapped speech segments

被引：2

作者：

Rozgic, Viktor ^{[1
]}

Han, Kyu Jeong ^{[1
]}

Georgiou, Panayiotis G. ^{[1
]}

Narayanan, Shrikanth ^{[1
]}

机构：

[1] Univ So Calif, Dept Elect Engn, Speech Anal & Interpretat Lab, Viterbi Sch Engn, Los Angeles, CA 90089 USA

来源：

ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA | 2008年

关键词：

D O I：

10.1109/ISM.2008.103

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a multimodal speaker segmentation algorithm with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of the three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel method for dealing with overlapped speech segments through a likelihood model of the microphone array observations that uses multiple local maxima of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function in the Joint Probabilistic Data Association (JPDA) framework. Results show that the proposed method outperforms standard speaker segmentation systems based on: (a) speaker identification and; (b) microphone array processing, for datasets with the significant portion (27.4%) of overlapped speech, and scores as high as 94.4% on the F-measure scale.

引用

页码：679 / 684

页数：6

共 50 条

[1] Multimodal speaker segmentation and identification in presence of overlapped speech segments
Rozgić V.
Han K.J.
Georgiou P.G.
Narayanan S.
Journal of Multimedia, 2010, 5 (04): : 322 - 331
[2] Improved Overlapped Speech Handling for Speaker Diarization
Boakye, Kofi
Vinyals, Oriol
Friedland, Gerald
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 948 - +
[3] Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers
Kanda, Naoyuki
Gaur, Yashesh
Wang, Xiaofei
Meng, Zhong
Chen, Zhuo
Zhou, Tianyan
Yoshioka, Takuya
INTERSPEECH 2020, 2020, : 36 - 40
[4] Overlapped Speech Detection and speaker counting using distant
Cornell, Samuele
Omologo, Maurizio
Squartini, Stefano
Vincent, Emmanuel
COMPUTER SPEECH AND LANGUAGE, 2022, 72
[5] SELECTION OF FEATURES AND SPEECH SEGMENTS FOR SPEAKER VERIFICATION
LIN, WC
PILLAY, SK
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1975, 58 : S107 - S107
[6] Overlapped speech detection for improved speaker diarization in multiparty meetings
Boakye, Kofi
Trueba-Hornero, Beatriz
Vinyals, Oriol
Friedland, Gerald
2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 4353 - 4356
[7] Robust Extraction of Desired Speaker's Utterance in Overlapped Speech
Lu, Haoze
Akaiwa, Yuma
Horiuchi, Yasuo
Kuroiwa, Shingo
ELECTRONICS AND COMMUNICATIONS IN JAPAN, 2016, 99 (11) : 80 - 89
[8] Robust extraction of desired speaker's utterance in overlapped speech
Graduate School of Advanced Integration Science, Chiba University, 1-33, Yayoicho, Inage-ku, Chiba-shi, Chiba
263-8522, Japan
IEEJ Trans. Electron. Inf. Syst., 8 (1009-1016):
[9] Speech Enhancement for Multimodal Speaker Diarization System
Ahmad, Rehan
Zubair, Syed
Alquhayz, Hani
IEEE ACCESS, 2020, 8 : 126671 - 126680
[10] Multimodal Speaker Identification Based on Text and Speech
Moschonas, Panagiotis
Kotropoulos, Constantine
BIOMETRICS AND IDENTITY MANAGEMENT, 2008, 5372 : 100 - 109

← 1 2 3 4 5 →