Speech/Nonspeech Segmentation in Web Videos

被引:0
|
作者
Misra, Ananya [1 ]
机构
[1] Google, New York, NY USA
关键词
segmentation; speech detection; voice activity detection; video; CLASSIFICATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Speech transcription of web videos requires first detecting segments with transcribable speech. We refer to this as segmentation. Commonly used segmentation techniques are inadequate for domains such as You Tube, where videos may have a large variety of background and recording conditions. In this work, we investigate alternative audio features and a discriminative classifier, which together yield a lower frame error rate (25.3%) on You Tube videos compared to the commonly used Gaussian mixture models trained on cepstral features (30.6%). The alternative audio features perform particularly well in noisy conditions.
引用
收藏
页码:1975 / 1978
页数:4
相关论文
共 50 条
  • [1] Functional neuroanatomy of segmentation of speech and nonspeech
    Burton, MW
    Small, SL
    NEUROIMAGE, 2001, 13 (06) : S511 - S511
  • [2] Speech vs nonspeech segmentation of audio signals using support vector machines
    Danisman, Taner
    Alpkocak, Adil
    2007 IEEE 15TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS, VOLS 1-3, 2007, : 854 - 857
  • [3] MORE ADAPTATION OF SPEECH BY NONSPEECH
    KAT, D
    SAMUEL, AG
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY-HUMAN PERCEPTION AND PERFORMANCE, 1984, 10 (04) : 512 - 525
  • [4] TRADING RELATIONS IN SPEECH AND NONSPEECH
    PARKER, EM
    DIEHL, RL
    KLUENDER, KR
    PERCEPTION & PSYCHOPHYSICS, 1986, 39 (02): : 129 - 142
  • [5] DISCRIMINATION IN SPEECH AND NONSPEECH MODES
    MATTINGLY, IG
    LIBERMAN, AM
    SYRDAL, AK
    HALWES, T
    COGNITIVE PSYCHOLOGY, 1971, 2 (02) : 131 - 157
  • [6] Segmentation of Lecture Videos based on Spontaneous Speech Recognition
    Repp, Stephan
    Meinel, Christoph
    ISM: 2008 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA, 2008, : 692 - 697
  • [7] CATEGORY BOUNDARIES FOR SPEECH AND NONSPEECH SOUNDS
    SAWUSCH, JR
    PISONI, DB
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1974, 55 (02): : 436 - 436
  • [8] Perception of Pitch Contours in Speech and Nonspeech
    Turner, Daniel R.
    Bradlow, Ann R.
    Cole, Jennifer S.
    INTERSPEECH 2019, 2019, : 2275 - 2279
  • [9] Speech versus nonspeech in pitch memory
    Semal, C
    Demany, L
    Ueda, K
    Halle, PA
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1996, 100 (02): : 1132 - 1140
  • [10] PERCEPTUAL COMPETITION BETWEEN SPEECH AND NONSPEECH
    DAY, RS
    CUTTING, JE
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1971, 49 (01): : 85 - &