FRAMEWORK FOR EVALUATION OF SOUND EVENT DETECTION IN WEB VIDEOS

被引:0
|
作者
Badlani, Rohan [2 ]
Shah, Ankit [1 ]
Elizalde, Benjamin [1 ]
Kumar, Anurag [1 ]
Raj, Bhiksha [1 ]
机构
[1] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
[2] BITS Pilani, Dept Comp Sci, Hyderabad, Telangana, India
关键词
Sound Event Detection; Convolutional Neural Network; Large-Scale audio event detection; Video Content Analysis;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The largest source of sound events is web videos. Most videos lack sound event labels at segment level, however, a significant number of them do respond to text queries, from a match found using metadata by search engines. In this paper we explore the extent to which a search query can be used as the true label for detection of sound events in videos. We present a framework for large-scale sound event recognition on web videos. The framework crawls videos using search queries corresponding to 78 sound event labels drawn from three datasets. The datasets are used to train three classifiers, and we obtain a prediction on 3.7 million web video segments. We evaluated performance using the search query as true label and compare it with human labeling. Both types of ground truth exhibited close performance, to within 10%, and similar performance trend with increasing number of evaluated segments. Hence, our experiments show potential for using search query as a preliminary true label for sound event recognition in web videos.
引用
收藏
页码:3096 / 3100
页数:5
相关论文
共 50 条
  • [31] EVALUATION OF POST-PROCESSING ALGORITHMS FOR POLYPHONIC SOUND EVENT DETECTION
    Cances, Leo
    Guyot, Patrice
    Pellegrini, Thomas
    2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA), 2019, : 318 - 322
  • [32] An Objects Detection Framework in UAV Videos
    Lu, Jianjiang
    Fang, Penglfei
    Tian, Yulong
    ADVANCES IN COMPUTER SCIENCE AND EDUCATION APPLICATIONS, PT II, 2011, 202 : 113 - 119
  • [33] Salient Event Detection in Basketball Mobile Videos
    Cricri, Francesco
    Mate, Sujeet
    Curcio, Igor D. D.
    Gabbouj, Moncef
    2014 IEEE INTERNATIONAL SYMPOSIUM ON MULTIMEDIA (ISM), 2014, : 63 - 70
  • [34] ResViT: A Framework for Deepfake Videos Detection
    Ahmad, Wasim
    Ali, Imad
    Shahzad, Sahibzada Adil
    Hashmi, Ammarah
    Ghaffar, Faisal
    INTERNATIONAL JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING SYSTEMS, 2022, 13 (09) : 807 - 813
  • [35] An Acoustic Event Detection Framework and Evaluation Metric for Surveillance in Cars
    Transfeld, Peter
    Receveur, Simon
    Fingscheidt, Tim
    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, 2015, : 2927 - 2931
  • [36] Visual Event Recognition in Videos by Learning from Web Data
    Duan, Lixin
    Xu, Dong
    Tsang, Ivor Wai-Hung
    Luo, Jiebo
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (09) : 1667 - 1680
  • [37] Beyond Search: Event-Driven Summarization for Web Videos
    Hong, Richang
    Tang, Jinhui
    Tan, Hung-Khoon
    Ngo, Chong-Wah
    Yan, Shuicheng
    Chua, Tat-Seng
    ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2011, 7 (04)
  • [38] Event Recognition in Videos by Learning from Heterogeneous Web Sources
    Chen, Lin
    Duan, Lixin
    Xu, Dong
    2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, : 2666 - 2673
  • [39] Visual Event Recognition in Videos by Learning from Web Data
    Duan, Lixin
    Xu, Dong
    Tsang, Ivor W.
    Luo, Jiebo
    2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 1959 - 1966
  • [40] Event Specific Attention for Polyphonic Sound Event Detection
    Sundar, Harshavardhan
    Sun, Ming
    Wang, Chao
    INTERSPEECH 2021, 2021, : 566 - 570