Multimodal speaker segmentation in presence of overlapped speech segments

被引：2

作者：

Rozgic, Viktor ^{[1
]}

Han, Kyu Jeong ^{[1
]}

Georgiou, Panayiotis G. ^{[1
]}

Narayanan, Shrikanth ^{[1
]}

机构：

[1] Univ So Calif, Dept Elect Engn, Speech Anal & Interpretat Lab, Viterbi Sch Engn, Los Angeles, CA 90089 USA

来源：

ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA | 2008年

关键词：

D O I：

10.1109/ISM.2008.103

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a multimodal speaker segmentation algorithm with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of the three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel method for dealing with overlapped speech segments through a likelihood model of the microphone array observations that uses multiple local maxima of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function in the Joint Probabilistic Data Association (JPDA) framework. Results show that the proposed method outperforms standard speaker segmentation systems based on: (a) speaker identification and; (b) microphone array processing, for datasets with the significant portion (27.4%) of overlapped speech, and scores as high as 94.4% on the F-measure scale.

引用

页码：679 / 684

页数：6

共 50 条

[21] Two's a Crowd: Improving Speaker Diarization by Automatically Identifying and Excluding Overlapped Speech
Boakye, Kofi
Vinyals, Oriol
Friedland, Gerald
INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5, 2008, : 32 - 35
[22] Entropy Based Overlapped Speech Detection as a Pre-Processing Stage for Speaker Diarization
Ben-Harush, Oshry
Lapidot, Itshak
Guterman, Hugo
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 908 - +
[23] Overlapped Speech Detection and Competing Speaker Counting-Humans Versus Deep Learning
Andrei, Valentin
Cucu, Horia
Burileanu, Corneliu
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2019, 13 (04) : 850 - 862
[24] Temporal variability in speech segments of Spanish:: context and speaker related differences
Mendoza, E
Carballo, G
Cruz, A
Fresneda, MD
Muñoz, J
Marrero, V
SPEECH COMMUNICATION, 2003, 40 (04) : 431 - 447
[25] Speaker Adaptation Intensively Weighted on Mis-Recognized Speech Segments
Oku, Takahiro
Fujita, Yuya
Kobayashi, Akio
Imai, Toru
2012 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2012,
[26] Speaker segmentation and adaptation for speech recognition on multiple-speaker audio conference data
Liu, Zhu
Saraclar, Murat
2007 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOLS 1-5, 2007, : 192 - +
[27] Effect of High-Energy Voiced Speech Segments and Speaker Gender on Shouted Speech Detection
Baghel, Shikha
Prasanna, S. R. M.
Guha, Prithwijit
2021 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2021, : 53 - 58
[28] A pitch-based rapid speech segmentation for speaker indexing
Yang, M
Yang, YC
Wu, ZH
ISM 2005: SEVENTH IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, PROCEEDINGS, 2005, : 571 - 576
[29] Neural speech turn segmentation and affinity propagation for speaker diarization
Yin, Ruiqing
Bredin, Herve
Barras, Claude
19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1393 - 1397
[30] Visual Speech Segmentation and Speaker Recognition for Transcription of TV News
Chaloupka, Josef
INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5, 2006, : 1284 - 1287

← 1 2 3 4 5 →