Speaker Identification for the Analysis of Joint Attention in Video

被引：0

作者：

Gonzalez Contreras, Carlos Eduardo ^{[1
]}

De-la-Torre, Miguel ^{[1
]}

Gonzalez Becerra, Victor Hugo ^{[1
]}

Avila-George, Himer ^{[1
]}

Hernandez Palacio, Raul ^{[2
]}

机构：

[1] Univ Guadalajara, Ameca, Mexico

[2] Univ Autonoma Estado Hidalgo, Pachuca, Hidalgo, Mexico

来源：

2019 8TH INTERNATIONAL CONFERENCE ON SOFTWARE PROCESS IMPROVEMENT (CIMPS) | 2019年

关键词：

Joint attention; speaker identification; MFCC; GMM; SVM;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Joint attention (AC) is a skill of human beings essential for the development of the individual, including language learning. Experimental studies in AC commonly involve the analysis of video recordings of scenes with interactions between individuals, and some elements are manually registered, including the intervention of each one. In this work, the design of a speaker identification system is proposed for the analysis of AC, which provides the sequence of interventions from each speaker in videos from AC scenarios. In order to support implementation, a comparative of the most common techniques for speaker identification is provided. Such techniques include the Mel Frequency Cepstral Coefficients (MFCC) and the addition of the MFCC+deltaMFCC. For classification, the Gaussian mixture models (GMM) and support vector machines (SVM) were employed. Results after a 5-fold cross validation process, with 30 audio segments with a duration of 3-4 seconds, throw an accuracy close to 90%, using MFCC+deltaMFCC with SVM. This result evidences the implementation feasibility of the proposed system.

引用

页数：7

共 50 条

[31] Speaker identification for household scenarios with self-attention and adversarial training
Li, Ruirui
Jiang, Jyun-Yu
Wu, Xian
Hsieh, Chu-Cheng
Stolcke, Andreas
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2020, 2020-October : 2272 - 2276
[32] Speaker Identification for Household Scenarios with Self-attention and Adversarial Training
Li, Ruirui
Joang, Jyun-Yu
Wu, Xian
Hsieh, Chu-Cheng
Stolcke, Andreas
INTERSPEECH 2020, 2020, : 2272 - 2276
[33] CSLBP and OCLBP local descriptors for speaker identification from video sequences
Chelali, Fatma zohra
Djeradi, Amar
PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,
[34] Workshop: Infants and joint attention: A clinical perspective using video
Downing, G.
INFANT MENTAL HEALTH JOURNAL, 2010, 31 (03) : 146 - 146
[35] Joint Speech Enhancement and Speaker Identification Using Monte Carlo Methods
Maina, Ciira Wa
Walsh, John MacLaren
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 1359 - 1362
[36] Joint Identification and Localization of a Speaker in Adverse Conditions Using a Microphone Array
Salvati, Daniele
Drioli, Carlo
Foresti, Gian Luca
2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 21 - 25
[37] A bilevel framework for joint optimization of session compensation and classification for speaker identification
Chen, Chen
Wang, Wei
He, Yongjun
Han, Jiqing
DIGITAL SIGNAL PROCESSING, 2019, 89 : 104 - 115
[38] An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions
Liu, Ying
Song, Yan
Jiang, Yiheng
McLoughlin, Ian
Liu, Lin
Dai, Lirong
INTERSPEECH 2020, 2020, : 3007 - 3011
[39] Streaming Multi-talker Speech Recognition with Joint Speaker Identification
Lu, Liang
Kanda, Naoyuki
Li, Jinyu
Gong, Yifan
INTERSPEECH 2021, 2021, : 1782 - 1786
[40] A Joint Approach for Single-Channel Speaker Identification and Speech Separation
Mowlaee, Pejman
Saeidi, Rahim
Christensen, Mads Grsboll
Tan, Zheng-Hua
Kinnunen, Tomi
Franti, Pasi
Jensen, Soren Holdt
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (09): : 2586 - 2601

← 1 2 3 4 5 →