Multimedia fusion in automatic extraction of studio speech segments for spoken document retrieval

被引：0

作者：

Hui, PY ^{[1
]}

Lo, WK ^{[1
]}

Meng, HM ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Human Comp Commun Lab, Shatin, Hong Kong, Peoples R China

来源：

2003 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL V, PROCEEDINGS: SENSOR ARRAY & MULTICHANNEL SIGNAL PROCESSING AUDIO AND ELECTROACOUSTICS MULTIMEDIA SIGNAL PROCESSING | 2003年

关键词：

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

This paper describes our progress in Cantonese spoken document retrieval. Over 60 hours of Cantonese television news broadcasts have been collected as part of AoE-IT Multimedia Repository. We have also developed the Multimedia Markup Language (MmML) for annotating the multimedia content in terms of anchor/field video frames and audio recordings. The audio tracks are indexed by a Cantonese syllable recognizer. Our investigation indicates that there is a large discrepancy in recognition performance, i.e. dropping from 59% to 39% in syllable accuracy (and corresponding reliability in audio indexing), as we move from anchor speech recorded in the studio to reporter/interview speech recorded in the field. Hence we present several automatic methods to extract anchor/studio speech from the audio tracks for retrieval: (i) extraction based only on video information using a fuzzy c-means algorithm; (ii) extraction based only on audio information using Gaussian Mixture Models; and (iii) a fusion strategy that combines video- and audio-based extraction. This paper presents the performance of various extraction techniques and the related retrieval performance in a known-item spoken document retrieval task.

引用

页码：724 / 727

页数：4

共 37 条

[1] Information fusion for spoken document retrieval
Ng, K
2000 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS, VOLS I-VI, 2000, : 2405 - 2408
[2] Speech transcription and spoken document retrieval in Finnish
Kurimo, M
Turunen, V
Ekman, I
MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3361 : 253 - 262
[3] Automatic story segmentation for spoken document retrieval
Hui, PY
Tang, XO
Meng, HM
Lam, W
Gao, XB
10TH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3: MEETING THE GRAND CHALLENGE: MACHINES THAT SERVE PEOPLE, 2001, : 1319 - 1322
[4] The RWTH speech recognition system and spoken document retrieval
Ney, H
Welling, L
Ortmanns, S
Beulen, K
Wessel, E
IECON '98 - PROCEEDINGS OF THE 24TH ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY, VOLS 1-4, 1998, : 2022 - 2027
[5] RWTH speech recognition system and spoken document retrieval
RWTH Aachen - Univ of Technology, Aachen, Germany
IECON Proc, 1600, (2022-2027):
[6] Evaluation of Spoken Document Retrieval for Historic Speech Collections
Heeren, W.
de Jong, F.
van der Werff, L.
Huijbregts, M.
Ordelman, R.
SIXTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, LREC 2008, 2008, : 2037 - 2041
[7] Automatic transcription of audio archives for spoken document retrieval
Ircing, Pavel
Psutka, Josef
Radova, Vlasta
PROCEEDINGS OF THE SECOND IASTED INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE, 2006, : 448 - +
[8] Schema extraction for multimedia XML document retrieval
Yoon, JP
Kim, S
PROCEEDINGS OF THE FIRST INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS ENGINEERING, VOL II, 2000, : 113 - 120
[9] Multimedia document retrieval using speech and speaker recognition
Viswanathan M.
Beigi H.S.M.
Dharanipragada S.
Maali F.
Tritschler A.
International Journal on Document Analysis and Recognition, 2000, 2 (04) : 147 - 162
[10] AUTOMATIC TOPIC DETECTION STRATEGY FOR INFORMATION RETRIEVAL IN SPOKEN DOCUMENT
Jin, Shan
Misra, Hemant
Sikora, Thomas
Jose, Joemon
2009 10TH INTERNATIONAL WORKSHOP ON IMAGE ANALYSIS FOR MULTIMEDIA INTERACTIVE SERVICES, 2009, : 300 - +

← 1 2 3 4 →