A speaker identification system for video content analysis

被引:0
|
作者
Bi, Jing [1 ]
Liu, Shu-Chang [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Beijing 100088, Peoples R China
来源
2008 FOURTH INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATION HIDING AND MULTIMEDIA SIGNAL PROCESSING, PROCEEDINGS | 2008年
关键词
D O I
10.1109/IIH-MSP.2008.215
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recently, more literatures proposed to apply audio content analysis techniques in content-based video parsing. This paper presents our current works on a speaker identification system for video content analysis. The system is different from normal ones in the following aspects: firstly, soundtrack extracted from video stream includes not only silence and speech, but also music and environmental sound; secondly, the number of speakers in video content are uncertain; thirdly, the presence of noise in the video can significantly deteriorate system performance. According to these considerations, our speaker identification system involves such basic parts: audio classification and segmentation using rule and Support Vector Machine(SVM) based classifier; speech clustering using spectral clustering technique and speaker identification based on Gaussian Mixture Model(GMM); speech enhancement based on spectral subtraction. Experiments are carried on a database extracted from news, conversation and movie videos. The obtained results confirm the validity of the proposed system architecture.
引用
收藏
页码:200 / 203
页数:4
相关论文
共 50 条
  • [41] RGB-D VIDEO CONTENT IDENTIFICATION
    Yu, Honghai
    Moulin, Pierre
    Roy, Sujoy
    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 3776 - 3780
  • [42] AUTOMATING VIDEO FILE CARVING AND CONTENT IDENTIFICATION
    Yannikos, York
    Ashraf, Nadeem
    Steinebach, Martin
    Winter, Christian
    ADVANCES IN DIGITAL FORENSICS IX, 2013, 410 : 195 - 212
  • [43] Content-based video identification: A survey
    Yang, XF
    Sun, QB
    Tian, Q
    ITRE2003: INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY: RESEARCH AND EDUCATION, 2003, : 50 - 54
  • [44] VIDEO CONTENT IDENTIFICATION USING THE VITERBI ALGORITHM
    Bhagavathy, Sitaram
    Chen, Wen
    Zou, Dekun
    Bloom, Jeffrey
    2012 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2012, : 969 - 972
  • [45] Perceptual Video Hashing for Content Identification and Authentication
    Khelifi, Fouad
    Bouridane, Ahmed
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (01) : 50 - 67
  • [46] Flexible surveillance system architecture for prototyping video content analysis algorithms
    Wijnhoven, RGJ
    Jaspers, EGT
    de With, PHN
    MULTIMEDIA CONTENT ANALYSIS, MANAGEMENT, AND RETRIEVAL 2006, 2006, 6073
  • [47] Forensic Phonetic identification and linguistic analysis of the speaker
    Varosanec-Skaric, Gordana
    Kisicek, Gabrijela
    SUVREMENA LINGVISTIKA, 2012, 38 (73): : 89 - 108
  • [48] Speaker Identification through Spectral Entropy Analysis
    Camarena-Ibarrola, Antonio
    Luque, Fernando
    Chavez, Edgar
    2017 IEEE INTERNATIONAL AUTUMN MEETING ON POWER, ELECTRONICS AND COMPUTING (ROPEC), 2017,
  • [49] The Role of Age in Factor Analysis for Speaker Identification
    Lei, Yun
    Hansen, John H. L.
    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2327 - 2330