Speaker Identification for the Analysis of Joint Attention in Video

被引:0
|
作者
Gonzalez Contreras, Carlos Eduardo [1 ]
De-la-Torre, Miguel [1 ]
Gonzalez Becerra, Victor Hugo [1 ]
Avila-George, Himer [1 ]
Hernandez Palacio, Raul [2 ]
机构
[1] Univ Guadalajara, Ameca, Mexico
[2] Univ Autonoma Estado Hidalgo, Pachuca, Hidalgo, Mexico
关键词
Joint attention; speaker identification; MFCC; GMM; SVM;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Joint attention (AC) is a skill of human beings essential for the development of the individual, including language learning. Experimental studies in AC commonly involve the analysis of video recordings of scenes with interactions between individuals, and some elements are manually registered, including the intervention of each one. In this work, the design of a speaker identification system is proposed for the analysis of AC, which provides the sequence of interventions from each speaker in videos from AC scenarios. In order to support implementation, a comparative of the most common techniques for speaker identification is provided. Such techniques include the Mel Frequency Cepstral Coefficients (MFCC) and the addition of the MFCC+deltaMFCC. For classification, the Gaussian mixture models (GMM) and support vector machines (SVM) were employed. Results after a 5-fold cross validation process, with 30 audio segments with a duration of 3-4 seconds, throw an accuracy close to 90%, using MFCC+deltaMFCC with SVM. This result evidences the implementation feasibility of the proposed system.
引用
收藏
页数:7
相关论文
共 50 条
  • [1] Joint audio-video processing for biometric speaker identification
    Kanak, A
    Erzin, E
    Yemez, Y
    Tekalp, AM
    2003 INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, VOL III, PROCEEDINGS, 2003, : 561 - 564
  • [2] Joint audio-video processing for biometric speaker identification
    Kanak, A
    Erzin, E
    Yemez, Y
    Tekalp, AM
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 377 - 380
  • [3] A speaker identification system for video content analysis
    Bi, Jing
    Liu, Shu-Chang
    2008 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PROCEEDINGS, 2008, : 200 - 203
  • [4] Speaker identification and video analysis for hierarchical video shot classification
    Nam, JH
    Cetin, AE
    Tewfik, AH
    INTERNATIONAL CONFERENCE ON IMAGE PROCESSING - PROCEEDINGS, VOL II, 1997, : 550 - 553
  • [5] Automatic audio classification and speaker identification for video content analysis
    Liu, Shu-Chang
    Bi, Jing
    Jia, Zhi-Qiang
    Chen, Rui
    Chen, Jie
    Zhou, Min-Min
    SNPD 2007: EIGHTH ACIS INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ARTIFICIAL INTELLIGENCE, NETWORKING, AND PARALLEL/DISTRIBUTED COMPUTING, VOL 2, PROCEEDINGS, 2007, : 91 - +
  • [6] Video classification using speaker identification
    Patel, NV
    Sethi, IK
    STORAGE AND RETRIEVAL FOR IMAGE AND VIDEO DATABASES V, 1997, 3022 : 218 - 225
  • [7] Speaker diarization with variants of self-attention and joint speaker embedding extractor
    Fu, Pengbin
    Ma, Yuchen
    Yang, Huirong
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 45 (05) : 9169 - 9180
  • [8] Joint analysis of the sound signal and its transcription for named speaker identification
    Jousse, Vincent
    Meignier, Sylvain
    Jacquin, Christine
    Petitrenaud, Simon
    Esteve, Yannick
    Daille, Beatrice
    TRAITEMENT AUTOMATIQUE DES LANGUES, 2009, 50 (01): : 201 - 225
  • [9] Joint Attention for Automated Video Editing
    Wu, Hui-Yin
    Santarra, Trevor
    Leece, Michael
    Vargas, Rolando
    Jhala, Arnav
    PROCEEDINGS OF THE 2020 ACM INTERNATIONAL CONFERENCE ON INTERACTIVE MEDIA EXPERIENCES, IMX 2020, 2020, : 55 - 64
  • [10] Speaker identification under mismatched speaking manner based on joint factor analysis
    Zhang, Q.-F. (qfzclear@yahoo.com.cn), 2012, Nanjing University of Science and Technology (36):