A speaker identification system for video content analysis

被引:0
|
作者
Bi, Jing [1 ]
Liu, Shu-Chang [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing 100088, Peoples R China
来源
2008 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PROCEEDINGS | 2008年
关键词
D O I
10.1109/IIH-MSP.2008.215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, more literatures proposed to apply audio content analysis techniques in content-based video parsing. This paper presents our current works on a speaker identification system for video content analysis. The system is different from normal ones in the following aspects: firstly, soundtrack extracted from video stream includes not only silence and speech, but also music and environmental sound; secondly, the number of speakers in video content are uncertain; thirdly, the presence of noise in the video can significantly deteriorate system performance. According to these considerations, our speaker identification system involves such basic parts: audio classification and segmentation using rule and Support Vector Machine(SVM) based classifier; speech clustering using spectral clustering technique and speaker identification based on Gaussian Mixture Model(GMM); speech enhancement based on spectral subtraction. Experiments are carried on a database extracted from news, conversation and movie videos. The obtained results confirm the validity of the proposed system architecture.
引用
收藏
页码:200 / 203
页数:4
相关论文
共 50 条
  • [21] Joint audio-video processing for biometric speaker identification
    Kanak, A
    Erzin, E
    Yemez, Y
    Tekalp, AM
    2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 377 - 380
  • [22] The Speaker and Content Adaptation in Radiology Information System
    Wu, Feiran
    Wang, Xinxin
    Ye, Zhiqian
    MECHANICAL ENGINEERING AND INTELLIGENT SYSTEMS, PTS 1 AND 2, 2012, 195-196 : 859 - 863
  • [23] Speaker identification using cepstral analysis
    Nazar, MN
    ISCON 2002: IEEE STUDENTS CONFERENCE ON EMERGING TECHNOLOGIES, PROCEEDINGS, 2002, : 139 - 143
  • [24] Adaptive Metadata Management System for Distributed Video Content Analysis
    Carincotte, C.
    Desurmont, X.
    Bastide, A.
    ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, PROCEEDINGS, 2008, 5259 : 334 - +
  • [25] ANALYSIS OF DNN APPROACHES TO SPEAKER IDENTIFICATION
    Matejka, Pavel
    Glembek, Ondrej
    Novotny, Ondrej
    Plchot, Oldrich
    Grezl, Frantisek
    Burget, Lukas
    Cernocky, Jan ''Honza''
    2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5100 - 5104
  • [26] SPEAKER IDENTIFICATION BY ANALYSIS OF SOUND ISLANDS
    WOOD, CA
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S183 - S183
  • [27] Screenplay alignment for closed-system speaker identification and analysis of feature films
    Turetsky, R
    Dimitrova, N
    2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1659 - 1662
  • [28] Automatic Speaker Identification Using Clinically Depressed Speech Content
    Memon, Sheeraz
    Shaikh, Faisal Karim
    Baloch, Javed Ali
    MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2012, 31 (02) : 259 - 264
  • [29] Robust video fingerprinting for content-based video identification
    Lee, Sunil
    Yoo, Chang D.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2008, 18 (07) : 983 - 988
  • [30] WISS, a Speaker Identification System for Mobile Robots
    Grondin, Francois
    Michaud, Francois
    2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2012, : 1817 - 1822