Exploring methods of improving speaker accuracy for speaker diarization

被引：0

作者：

Knox, Mary Tai ^{[1
,2
]}

Mirghafori, Nikki ^{[1
]}

Friedland, Gerald ^{[1
]}

机构：

[1] Int Comp Sci Inst, Berkeley, CA 94704 USA

[2] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA USA

来源：

14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5 | 2013年

关键词：

speaker diarization; cluster purification; temporal; smoothing;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The focus of this work is to improve the speaker diarization error rate, and more specifically the speaker error rate. We investigate two methods of improving the speaker error rate: modifying the minimum duration constraint and incorporating novel purification techniques. First, in the final step of the speaker diarization algorithm we replace the minimum duration constraint with a simple smoothing algorithm, which averages the log -likelihoods for each of the hypothesized speakers. This method improves the speaker error rate by 12% relative for the MDM condition. Second, we utilize the difference between the largest and second largest log -likelihoods to identify frames which are believed to be correct (or "pure"). The difference value is shown be more effective at separating correct frames from incorrect frames than the previously used maximum log-likelihood value. Using only the "pure" frames, the cluster models are retrained and segmentation is performed using the above mentioned smoothing technique. The proposed purification and smoothing reduces the speaker error rate over the baseline; however, it is worse than performing the smoothing step alone.

引用

页码：2782 / 2786

页数：5

共 50 条

[1] Speaker Diarization: A Review of Objectives and Methods
O'Shaughnessy, Douglas
APPLIED SCIENCES-BASEL, 2025, 15 (04):
[2] SPEAKER DIARIZATION THROUGH SPEAKER EMBEDDINGS
Rouvier, Mickael
Bousquet, Pierre-Michel
Favre, Benoit
2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 2082 - 2086
[3] Improving speaker diarization by cross EM refinement
Ning, Huazhong
Xu, Wei
Gong, Yihong
Huang, Thomas
2006 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO - ICME 2006, VOLS 1-5, PROCEEDINGS, 2006, : 1901 - 1904
[4] Improving Speaker Diarization for CHIL Lecture Meetings
Huang, Jing
Marcheret, Etienne
Visweswariah, Karthik
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2628 - 2631
[5] On the Use of Spectral and Iterative Methods for Speaker Diarization
Shum, Stephen
Dehak, Najim
Glass, Jim
13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 482 - 485
[6] A SPEAKER REDIARIZATION SCHEME FOR IMPROVING DIARIZATION IN LARGE TWO-SPEAKER TELEPHONE DATASETS
Ghaemmaghami, Houman
Dean, David
Sridharan, Sridha
2014 PROCEEDINGS OF THE 22ND EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2014, : 1272 - 1276
[7] Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization
Cheng, Luyao
Zheng, Siqi
Zhang Qinglin
Wang, Hui
Chen, Yafeng
Chen, Qian
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2023), 2023, : 14068 - 14077
[8] Multimodal Speaker Diarization
Noulas, Athanasios
Englebienne, Gwenn
Krose, Ben J. A.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (01) : 79 - 93
[9] SPEAKER DIARIZATION WITH LSTM
Wang, Quan
Downey, Carlton
Wan, Li
Mansfield, Philip Andrew
Moreno, Ignacio Lopez
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5239 - 5243
[10] Trainable Speaker Diarization
Aronowitz, Hagai
INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 2021 - 2024

← 1 2 3 4 5 →