A speaker identification system for video content analysis

被引：0

作者：

Bi, Jing ^{[1
]}

Liu, Shu-Chang ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing 100088, Peoples R China

来源：

2008 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PROCEEDINGS | 2008年

关键词：

D O I：

10.1109/IIH-MSP.2008.215

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recently, more literatures proposed to apply audio content analysis techniques in content-based video parsing. This paper presents our current works on a speaker identification system for video content analysis. The system is different from normal ones in the following aspects: firstly, soundtrack extracted from video stream includes not only silence and speech, but also music and environmental sound; secondly, the number of speakers in video content are uncertain; thirdly, the presence of noise in the video can significantly deteriorate system performance. According to these considerations, our speaker identification system involves such basic parts: audio classification and segmentation using rule and Support Vector Machine(SVM) based classifier; speech clustering using spectral clustering technique and speaker identification based on Gaussian Mixture Model(GMM); speech enhancement based on spectral subtraction. Experiments are carried on a database extracted from news, conversation and movie videos. The obtained results confirm the validity of the proposed system architecture.

引用

页码：200 / 203

页数：4

共 50 条

[21] Joint audio-video processing for biometric speaker identification
Kanak, A
Erzin, E
Yemez, Y
Tekalp, AM
2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PROCEEDINGS: SPEECH II; INDUSTRY TECHNOLOGY TRACKS; DESIGN & IMPLEMENTATION OF SIGNAL PROCESSING SYSTEMS; NEURAL NETWORKS FOR SIGNAL PROCESSING, 2003, : 377 - 380
[22] The Speaker and Content Adaptation in Radiology Information System
Wu, Feiran
Wang, Xinxin
Ye, Zhiqian
MECHANICAL ENGINEERING AND INTELLIGENT SYSTEMS, PTS 1 AND 2, 2012, 195-196 : 859 - 863
[23] Speaker identification using cepstral analysis
Nazar, MN
ISCON 2002: IEEE STUDENTS CONFERENCE ON EMERGING TECHNOLOGIES, PROCEEDINGS, 2002, : 139 - 143
[24] Adaptive Metadata Management System for Distributed Video Content Analysis
Carincotte, C.
Desurmont, X.
Bastide, A.
ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, PROCEEDINGS, 2008, 5259 : 334 - +
[25] ANALYSIS OF DNN APPROACHES TO SPEAKER IDENTIFICATION
Matejka, Pavel
Glembek, Ondrej
Novotny, Ondrej
Plchot, Oldrich
Grezl, Frantisek
Burget, Lukas
Cernocky, Jan ''Honza''
2016 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING PROCEEDINGS, 2016, : 5100 - 5104
[26] SPEAKER IDENTIFICATION BY ANALYSIS OF SOUND ISLANDS
WOOD, CA
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1978, 64 : S183 - S183
[27] Screenplay alignment for closed-system speaker identification and analysis of feature films
Turetsky, R
Dimitrova, N
2004 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXP (ICME), VOLS 1-3, 2004, : 1659 - 1662
[28] Automatic Speaker Identification Using Clinically Depressed Speech Content
Memon, Sheeraz
Shaikh, Faisal Karim
Baloch, Javed Ali
MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2012, 31 (02) : 259 - 264
[29] Robust video fingerprinting for content-based video identification
Lee, Sunil
Yoo, Chang D.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2008, 18 (07) : 983 - 988
[30] WISS, a Speaker Identification System for Mobile Robots
Grondin, Francois
Michaud, Francois
2012 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2012, : 1817 - 1822

← 1 2 3 4 5 →