Exploring speech retrieval from meetings using the AMI corpus

被引：2

作者：

Eskevich, Maria ^{[1
]}

Jones, Gareth J. F. ^{[1
]}

机构：

[1] Dublin City Univ, Sch Comp, CNGL Ctr Global Intelligent Content, Dublin 9, Ireland

来源：

COMPUTER SPEECH AND LANGUAGE | 2014年 / 28卷 / 05期

基金：

爱尔兰科学基金会;

关键词：

Speech retrieval; Recall-focused information retrieval; Informal spoken content search; Retrieval unit segmentation; RECOGNITION; DOCUMENTS;

D O I：

10.1016/j.csl.2013.12.005

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Increasing amounts of informal spoken content are being collected, e.g. recordings of meetings, lectures and personal data sources. The amount of this content being captured and the difficulties of manually searching audio data mean that efficient automated search tools are of increasing importance if its full potential is to be realized. Much existing work on speech search has focused on retrieval of clearly defined document units in ad hoc search tasks. We investigate search of informal speech content using an extended version of the AMI meeting collection. A retrieval collection was constructed by augmenting the AMI corpus with a set of ad hoc search requests and manually identified relevant regions of the recorded meetings. Unlike standard ad hoc information retrieval focussing primarily on precision, we assume a recall-focused search scenario of a user seeking to retrieve a particular incident occurring within meetings relevant to the query. We explore the relationship between automatic speech recognition (ASR) accuracy, automated segmentation of the meeting into retrieval units and retrieval behaviour with respect to both precision and recall. Experimental retrieval results show that while averaged retrieval effectiveness is generally comparable in terms of precision for automatically extracted segments for manual content transcripts and ASR transcripts with high recognition accuracy, segments with poor recognition quality become very hard to retrieve and may fall below the retrieval rank position to which a user is willing search. These changes impact on system effectiveness for recall-focused search tasks. Varied ASR quality across the relevant and non-relevant data means that the rank of some well-recognized relevant segments is actually promoted for ASR transcripts compared to manual ones. This effect is not revealed by the averaged precision based retrieval evaluation metrics typically used for evaluation of speech retrieval. However such variations in the ranks of relevant segments can impact considerably on the experience of the user in terms of the order in which retrieved content is presented. Analysis of our results reveals that while relevant longer segments are generally more robust to ASR errors, and consequentially retrieved at higher ranks, this is often at the expense of the user needing to engage in longer content playback to locate the relevant content in the audio recording. Our overall conclusion being that it is desirable to minimize the length of retrieval units containing relevant content while seeking to maintain high ranking of these items. (C) 2014 Elsevier Ltd. All rights reserved.

引用

页码：1021 / 1044

页数：24

共 50 条

[31] CROSS-CORPUS DEPRESSION PREDICTION FROM SPEECH
Mitra, Vikramjit
Shriberg, Elizabeth
Vergyri, Dimitra
Knoth, Bruce
Salomon, Ronald M.
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4769 - 4773
[32] Speech Segmentation Optimization using Segmented Bilingual Speech Corpus for End-to-end Speech Translation
Fukuda, Ryo
Sudoh, Katsuhito
Nakamura, Satoshi
INTERSPEECH 2022, 2022, : 121 - 125
[33] Exploring a corpus of scientific texts using data mining
Teich, Elke
Fankhauser, Peter
CORPUS-LINGUISTIC APPLICATIONS CURRENT STUDIES, NEW DIRECTIONS, 2010, 71 : 233 - 247
[34] DNN-based Speech Synthesis Using Abundant Tags of Spontaneous Speech Corpus
Yamashita, Yuki
Koriyama, Tomoki
Saito, Yuki
Takamichi, Shinnosuke
Ijima, Yusuke
Masumura, Ryo
Saruwatari, Hiroshi
PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 6438 - 6443
[35] OVERT SPEECH RETRIEVAL FROM NEUROMAGNETIC SIGNALS USING WAVELETS AND ARTIFICIAL NEURAL NETWORKS
Dash, Debadatta
Ferrari, Paul
Malik, Saleem
Wang, Jun
2018 IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (GLOBALSIP 2018), 2018, : 489 - 493
[36] Speech retrieval from unsegmented finnish audio using statistical morpheme-like units for segmentation, recognition, and retrieval
Turunen V.T.
Kurimo M.
ACM Transactions on Speech and Language Processing, 2011, 8 (01):
[37] Multimedia document retrieval using speech and speaker recognition
Viswanathan M.
Beigi H.S.M.
Dharanipragada S.
Maali F.
Tritschler A.
International Journal on Document Analysis and Recognition, 2000, 2 (04) : 147 - 162
[38] Development of Assamese Speech Corpus and Automatic Transcription Using HTK
Sarma, Himangshu
Saharia, Navanath
Sharma, Utpal
ADVANCES IN SIGNAL PROCESSING AND INTELLIGENT RECOGNITION SYSTEMS, 2014, 264 : 119 - 132
[39] Developing Tamil Emotional Speech Corpus and Evaluating using SVM
Joe, C. Vijesh
2014 International Conference on Science Engineering and Management Research (ICSEMR), 2014,
[40] Using a Serious Game to Collect a Child Learner Speech Corpus
Baur, Claudia
Rayner, Manny
Tsourakis, Nikos
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2726 - 2732

← 1 2 3 4 5 →