Multimedia fusion in automatic extraction of studio speech segments for spoken document retrieval

被引:0
|
作者
Hui, PY [1 ]
Lo, WK [1 ]
Meng, HM [1 ]
机构
[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Human Comp Commun Lab, Shatin, Hong Kong, Peoples R China
关键词
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper describes our progress in Cantonese spoken document retrieval. Over 60 hours of Cantonese television news broadcasts have been collected as part of AoE-IT Multimedia Repository. We have also developed the Multimedia Markup Language (MmML) for annotating the multimedia content in terms of anchor/field video frames and audio recordings. The audio tracks are indexed by a Cantonese syllable recognizer. Our investigation indicates that there is a large discrepancy in recognition performance, i.e. dropping from 59% to 39% in syllable accuracy (and corresponding reliability in audio indexing), as we move from anchor speech recorded in the studio to reporter/interview speech recorded in the field. Hence we present several automatic methods to extract anchor/studio speech from the audio tracks for retrieval: (i) extraction based only on video information using a fuzzy c-means algorithm; (ii) extraction based only on audio information using Gaussian Mixture Models; and (iii) a fusion strategy that combines video- and audio-based extraction. This paper presents the performance of various extraction techniques and the related retrieval performance in a known-item spoken document retrieval task.
引用
收藏
页码:724 / 727
页数:4
相关论文
共 37 条
  • [1] Information fusion for spoken document retrieval
    Ng, K
    2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 2405 - 2408
  • [2] Speech transcription and spoken document retrieval in Finnish
    Kurimo, M
    Turunen, V
    Ekman, I
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3361 : 253 - 262
  • [3] Automatic story segmentation for spoken document retrieval
    Hui, PY
    Tang, XO
    Meng, HM
    Lam, W
    Gao, XB
    10TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3: MEETING THE GRAND CHALLENGE: MACHINES THAT SERVE PEOPLE, 2001, : 1319 - 1322
  • [4] The RWTH speech recognition system and spoken document retrieval
    Ney, H
    Welling, L
    Ortmanns, S
    Beulen, K
    Wessel, E
    IECON '98 - PROCEEDINGS OF THE 24TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOLS 1-4, 1998, : 2022 - 2027
  • [5] RWTH speech recognition system and spoken document retrieval
    RWTH Aachen - Univ of Technology, Aachen, Germany
    IECON Proc, 1600, (2022-2027):
  • [6] Evaluation of Spoken Document Retrieval for Historic Speech Collections
    Heeren, W.
    de Jong, F.
    van der Werff, L.
    Huijbregts, M.
    Ordelman, R.
    SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2037 - 2041
  • [7] Automatic transcription of audio archives for spoken document retrieval
    Ircing, Pavel
    Psutka, Josef
    Radova, Vlasta
    PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 448 - +
  • [8] Schema extraction for multimedia XML document retrieval
    Yoon, JP
    Kim, S
    PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, VOL II, 2000, : 113 - 120
  • [9] Multimedia document retrieval using speech and speaker recognition
    Viswanathan M.
    Beigi H.S.M.
    Dharanipragada S.
    Maali F.
    Tritschler A.
    International Journal on Document Analysis and Recognition, 2000, 2 (04) : 147 - 162
  • [10] AUTOMATIC TOPIC DETECTION STRATEGY FOR INFORMATION RETRIEVAL IN SPOKEN DOCUMENT
    Jin, Shan
    Misra, Hemant
    Sikora, Thomas
    Jose, Joemon
    2009 10TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES, 2009, : 300 - +