Oceanship: A Large-Scale Dataset for Underwater Audio Target Recognition

被引:0
|
作者
Li, Zeyu [1 ,2 ]
Xiang, Suncheng [1 ,3 ]
Yu, Tong [1 ,2 ]
Gao, Jingsheng [1 ,2 ]
Ruan, Jiacheng [1 ,2 ]
Hu, Yanping [1 ,2 ]
Liu, Ting [1 ,2 ]
Fu, Yuzhuo [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai 200240, Peoples R China
[2] Sch Elect Informat & Elect Engn, Shanghai, Peoples R China
[3] Sch Biomed Engn, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Underwater Acoustic Target Recognition; Audio Retrieval; Zero-Shot Classification;
D O I
10.1007/978-981-97-5591-2_40
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The recognition of underwater audio plays a significant role in identifying a vessel while it is in motion. Underwater target recognition tasks have a wide range of applications in areas such as marine environmental protection, detection of ship radiated noise, underwater noise control, and coastal vessel dispatch. The traditional UATR task involves training a network to extract features from audio data and predict the vessel type. The current UATR dataset exhibits shortcomings in both duration and sample quantity. In this paper, we propose Oceanship, a large-scale and diverse underwater audio dataset. This dataset comprises 15 categories, spans a total duration of 121 h, and includes comprehensive annotation information such as coordinates, velocity, vessel types, and timestamps. We compiled the dataset by crawling and organizing original communication data from the Ocean Communication Network (ONC) database between 2021 and 2022. While audio retrieval tasks are well-established in general audio classification, they have not been explored in the context of underwater audio recognition. Leveraging the Oceanship dataset, we introduce a baseline model named Oceannet for underwater audio retrieval. This model achieves a recall at 1 (R@1) accuracy of 67.11% and a recall at 5 (R@5) accuracy of 99.13% on the Deepship dataset.
引用
收藏
页码:475 / 486
页数:12
相关论文
共 50 条
  • [1] VGGSOUND: A LARGE-SCALE AUDIO-VISUAL DATASET
    Chen, Honglie
    Xie, Weidi
    Vedaldi, Andrea
    Zisserman, Andrew
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 721 - 725
  • [2] A large-scale fMRI dataset for human action recognition
    Zhou, Ming
    Gong, Zhengxin
    Dai, Yuxuan
    Wen, Yushan
    Liu, Youyi
    Zhen, Zonglei
    SCIENTIFIC DATA, 2023, 10 (01)
  • [3] A large-scale fMRI dataset for human action recognition
    Ming Zhou
    Zhengxin Gong
    Yuxuan Dai
    Yushan Wen
    Youyi Liu
    Zonglei Zhen
    Scientific Data, 10
  • [4] Large-scale audio dataset for emergency vehicle sirens and road noises
    Asif, Muhammad
    Usaid, Muhammad
    Rashid, Munaf
    Rajab, Tabarka
    Hussain, Samreen
    Wasi, Sarwar
    SCIENTIFIC DATA, 2022, 9 (01)
  • [5] Large-scale audio dataset for emergency vehicle sirens and road noises
    Muhammad Asif
    Muhammad Usaid
    Munaf Rashid
    Tabarka Rajab
    Samreen Hussain
    Sarwar Wasi
    Scientific Data, 9
  • [6] PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
    Kong, Qiuqiang
    Cao, Yin
    Iqbal, Turab
    Wang, Yuxuan
    Wang, Wenwu
    Plumbley, Mark D.
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2020, 28 : 2880 - 2894
  • [7] A Large-Scale UAV Audio Dataset and Audio-Based UAV Classification Using CNN
    Wang, Yaqin
    Chu, Zhiwei
    Ku, Ilmun
    Smith, E. Cho
    Matson, Eric T.
    2022 SIXTH IEEE INTERNATIONAL CONFERENCE ON ROBOTIC COMPUTING, IRC, 2022, : 186 - 189
  • [8] A Large-Scale 3D Object Recognition dataset
    Solund, Thomas
    Buch, Anders Glent
    Kruger, Norbert
    Aanaes, Henrik
    PROCEEDINGS OF 2016 FOURTH INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2016, : 73 - 82
  • [9] A Large-scale Benchmark Dataset for Event Recognition in Surveillance Video
    Oh, Sangmin
    Hoogs, Anthony
    Perera, Amitha
    Cuntoor, Naresh
    Chen, Chia-Chih
    Lee, Jong Taek
    Mukherjee, Saurajit
    Aggarwal, J. K.
    Lee, Hyungtae
    Davis, Larry
    Swears, Eran
    Wang, Xioyang
    Ji, Qiang
    Reddy, Kishore
    Shah, Mubarak
    Vondrick, Carl
    Pirsiavash, Hamed
    Ramanan, Deva
    Yuen, Jenny
    Torralba, Antonio
    Song, Bi
    Fong, Anesco
    Roy-Chowdhury, Amit
    Desai, Mita
    2011 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2011,
  • [10] LSSED: A LARGE-SCALE DATASET AND BENCHMARK FOR SPEECH EMOTION RECOGNITION
    Fan, Weiquan
    Xu, Xiangmin
    Xing, Xiaofen
    Chen, Weidong
    Huang, Dongyan
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 641 - 645