Speaker Identification for the Analysis of Joint Attention in Video

被引:0
|
作者
Gonzalez Contreras, Carlos Eduardo [1 ]
De-la-Torre, Miguel [1 ]
Gonzalez Becerra, Victor Hugo [1 ]
Avila-George, Himer [1 ]
Hernandez Palacio, Raul [2 ]
机构
[1] Univ Guadalajara, Ameca, Mexico
[2] Univ Autonoma Estado Hidalgo, Pachuca, Hidalgo, Mexico
关键词
Joint attention; speaker identification; MFCC; GMM; SVM;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Joint attention (AC) is a skill of human beings essential for the development of the individual, including language learning. Experimental studies in AC commonly involve the analysis of video recordings of scenes with interactions between individuals, and some elements are manually registered, including the intervention of each one. In this work, the design of a speaker identification system is proposed for the analysis of AC, which provides the sequence of interventions from each speaker in videos from AC scenarios. In order to support implementation, a comparative of the most common techniques for speaker identification is provided. Such techniques include the Mel Frequency Cepstral Coefficients (MFCC) and the addition of the MFCC+deltaMFCC. For classification, the Gaussian mixture models (GMM) and support vector machines (SVM) were employed. Results after a 5-fold cross validation process, with 30 audio segments with a duration of 3-4 seconds, throw an accuracy close to 90%, using MFCC+deltaMFCC with SVM. This result evidences the implementation feasibility of the proposed system.
引用
收藏
页数:7
相关论文
共 50 条
  • [21] Joint Attention Mechanism for Unsupervised Video Object Segmentation
    Yao, Rui
    Xu, Xin
    Zhou, Yong
    Zhao, Jiaqi
    Fang, Liang
    PATTERN RECOGNITION AND COMPUTER VISION, PT I, 2021, 13019 : 154 - 165
  • [22] A Joint Factor Analysis Approach to Whispering Speaker Identification under Mismatched Speaking Manners and Channels
    Zhang, Qingfang
    Zhao, Heming
    Gu, Xiaojiang
    PROCEEDINGS OF 2012 IEEE 11TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP) VOLS 1-3, 2012, : 540 - 544
  • [23] Active Speaker Recognition using Cross Attention Audio-Video Fusion
    Mocanu, Bogdan
    Tapu, Ruxandra
    2022 10TH EUROPEAN WORKSHOP ON VISUAL INFORMATION PROCESSING (EUVIP), 2022,
  • [24] Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers
    Kanda, Naoyuki
    Gaur, Yashesh
    Wang, Xiaofei
    Meng, Zhong
    Chen, Zhuo
    Zhou, Tianyan
    Yoshioka, Takuya
    INTERSPEECH 2020, 2020, : 36 - 40
  • [25] JOINT SINGLE-CHANNEL SPEECH SEPARATION AND SPEAKER IDENTIFICATION
    Mowlaee, P.
    Saeidi, R.
    Tan, Z. -H.
    Christensen, M. G.
    Franti, P.
    Jensen, S. H.
    2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4430 - 4433
  • [26] Structural Joint Factor Analysis for Speaker Recognition
    Ferras, Marc
    Shinoda, Koichi
    Furui, Sadaoki
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 2384 - +
  • [27] Speaker identification using cepstral analysis
    Nazar, MN
    ISCON 2002: IEEE STUDENTS CONFERENCE ON EMERGING TECHNOLOGIES, PROCEEDINGS, 2002, : 139 - 143
  • [28] ANALYSIS OF DNN APPROACHES TO SPEAKER IDENTIFICATION
    Matejka, Pavel
    Glembek, Ondrej
    Novotny, Ondrej
    Plchot, Oldrich
    Grezl, Frantisek
    Burget, Lukas
    Cernocky, Jan ''Honza''
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5100 - 5104
  • [29] SPEAKER IDENTIFICATION BY ANALYSIS OF SOUND ISLANDS
    WOOD, CA
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S183 - S183
  • [30] Audio-Visual Speaker Verification via Joint Cross-Attention
    Rajasekhar, Gnana Praveen
    Alam, Jahangir
    SPEECH AND COMPUTER, SPECOM 2023, PT II, 2023, 14339 : 18 - 31