Exploring speech retrieval from meetings using the AMI corpus

被引:2
|
作者
Eskevich, Maria [1 ]
Jones, Gareth J. F. [1 ]
机构
[1] Dublin City Univ, Sch Comp, CNGL Ctr Global Intelligent Content, Dublin 9, Ireland
来源
COMPUTER SPEECH AND LANGUAGE | 2014年 / 28卷 / 05期
基金
爱尔兰科学基金会;
关键词
Speech retrieval; Recall-focused information retrieval; Informal spoken content search; Retrieval unit segmentation; RECOGNITION; DOCUMENTS;
D O I
10.1016/j.csl.2013.12.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Increasing amounts of informal spoken content are being collected, e.g. recordings of meetings, lectures and personal data sources. The amount of this content being captured and the difficulties of manually searching audio data mean that efficient automated search tools are of increasing importance if its full potential is to be realized. Much existing work on speech search has focused on retrieval of clearly defined document units in ad hoc search tasks. We investigate search of informal speech content using an extended version of the AMI meeting collection. A retrieval collection was constructed by augmenting the AMI corpus with a set of ad hoc search requests and manually identified relevant regions of the recorded meetings. Unlike standard ad hoc information retrieval focussing primarily on precision, we assume a recall-focused search scenario of a user seeking to retrieve a particular incident occurring within meetings relevant to the query. We explore the relationship between automatic speech recognition (ASR) accuracy, automated segmentation of the meeting into retrieval units and retrieval behaviour with respect to both precision and recall. Experimental retrieval results show that while averaged retrieval effectiveness is generally comparable in terms of precision for automatically extracted segments for manual content transcripts and ASR transcripts with high recognition accuracy, segments with poor recognition quality become very hard to retrieve and may fall below the retrieval rank position to which a user is willing search. These changes impact on system effectiveness for recall-focused search tasks. Varied ASR quality across the relevant and non-relevant data means that the rank of some well-recognized relevant segments is actually promoted for ASR transcripts compared to manual ones. This effect is not revealed by the averaged precision based retrieval evaluation metrics typically used for evaluation of speech retrieval. However such variations in the ranks of relevant segments can impact considerably on the experience of the user in terms of the order in which retrieved content is presented. Analysis of our results reveals that while relevant longer segments are generally more robust to ASR errors, and consequentially retrieved at higher ranks, this is often at the expense of the user needing to engage in longer content playback to locate the relevant content in the audio recording. Our overall conclusion being that it is desirable to minimize the length of retrieval units containing relevant content while seeking to maintain high ranking of these items. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1021 / 1044
页数:24
相关论文
共 50 条
  • [1] The AMI system for the transcription of speech in meetings
    Hain, Thomas
    Burget, Lukas
    Dines, John
    Garau, Giulia
    Karafiat, Martin
    Lincoln, Mike
    Vepa, Jithendra
    Wan, Vincent
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3, 2007, : 357 - +
  • [2] The 2005 AMI system for the transcription of speech in meetings
    Hain, T
    Burget, L
    Dines, J
    Garau, G
    Karafiat, M
    Lincoln, M
    McCowan, I
    Moore, D
    Wan, V
    Ordelman, R
    Renals, S
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 450 - 462
  • [3] The development of the AMI system for the transcription of speech in meetings
    Hain, T
    Burget, L
    Dines, J
    McCowan, I
    Garau, G
    Karafiat, M
    Lincoln, M
    Moore, D
    Wan, V
    Ordelman, R
    Renals, S
    MACHINE LEARNING FOR MULTIMODAL INTERACTION, 2005, 3869 : 344 - 356
  • [4] Annotation and Recognition of Personality Traits in Spoken Conversations from the AMI Meetings Corpus
    Valente, Fabio
    Kim, Samuel
    Motlicek, Petr
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 1182 - 1185
  • [5] EXPLORING DEMENTIA DETECTION FROM SPEECH: CROSS CORPUS ANALYSIS
    Ablimit, Ayimnisagul
    Botelho, Catarina
    Abad, Alberto
    Schultz, Tanja
    Trancoso, Isabel
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6472 - 6476
  • [6] Language-Independent Socio-Emotional Role Recognition in the AMI Meetings Corpus
    Valente, Fabio
    Vinciarelli, Alessandro
    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5, 2011, : 3084 - +
  • [7] More to Meetings: Challenges in Using Speech-Based Technology to Support Meetings
    McGregor, Moira
    Tang, John C.
    CSCW'17: PROCEEDINGS OF THE 2017 ACM CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING, 2017, : 2208 - 2220
  • [8] Information Retrieval and Recommendation using Emotion from Speech Signals
    Iliev, Alexander
    Stanchev, Peter L.
    IEEE 1ST CONFERENCE ON MULTIMEDIA INFORMATION PROCESSING AND RETRIEVAL (MIPR 2018), 2018, : 222 - 225
  • [9] Lexical Analysis Using Regular Expressions for Information Retrieval from a Legal Corpus
    Mario Spositto, Osvaldo
    Cesar Bossero, Julio
    Javier Moreno, Edgardo
    Alejandra Ledesma, Viviana
    Romina Matteo, Lorena
    COMPUTER SCIENCE, CACIC 2021, 2022, 1584 : 312 - 324
  • [10] Exploring corpus-invariant emotional acoustic feature for cross-corpus speech emotion recognition
    Lian, Hailun
    Lu, Cheng
    Zhao, Yan
    Li, Sunan
    Qi, Tianhua
    Zong, Yuan
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 258