CLOTHO: AN AUDIO CAPTIONING DATASET

被引：0

作者：

Drossos, Konstantinos ^{[1
]}

Lipping, Samuel ^{[1
]}

Virtanen, Tuomas ^{[1
]}

机构：

[1] Tampere Univ, Audio Res Grp, Tampere, Finland

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

基金：

欧洲研究理事会;

关键词：

audio captioning; dataset; Clotho;

D O I：

10.1109/icassp40776.2020.9052990

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Audio captioning is the novel task of general audio content description using free text. It is an intermodal translation task (not speech-to-text), where a system accepts as an input an audio signal and outputs the textual description (i.e. the caption) of that signal. In this paper we present Clotho, a dataset for audio captioning consisting of 4981 audio samples of 15 to 30 seconds duration and 24 905 captions of eight to 20 words length, and a baseline method to provide initial results. Clotho is built with focus on audio content and caption diversity, and the splits of the data are not hampering the training or evaluation of methods. All sounds are from the Freesound platform, and captions are crowdsourced using Amazon Mechanical Turk and annotators from English speaking countries. Unique words, named entities, and speech transcription are removed with post-processing. Clotho is freely available online(1).

引用

页码：736 / 740

页数：5

共 50 条

[21] Watching Clotho
Lauer, Melissa
FOURTH GENRE-EXPLORATIONS IN NONFICTION, 2023, 25 (01) : 14 - 23
[22] Clotho
O'Reilly, Caitriona
PLOUGHSHARES, 2015, 41 (01) : 151 - 151
[23] ACTUAL: Audio Captioning With Caption Feature Space Regularization
Zhang, Yiming
Yu, Hong
Du, Ruoyi
Tan, Zheng-Hua
Wang, Wenwu
Ma, Zhanyu
Dong, Yuan
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2023, 31 : 2643 - 2657
[24] Leveraging Pre-trained BERT for Audio Captioning
Liu, Xubo
Mei, Xinhao
Huang, Qiushi
Sun, Jianyuan
Zhao, Jinzheng
Liu, Haohe
Plumbley, Mark D.
Kilic, Volkan
Wang, Wenwu
2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1145 - 1149
[25] Design of an audio advertisement dataset
Fu, Yutao
Liu, Jihong
Zhang, Qi
Geng, Yuting
SIXTH INTERNATIONAL CONFERENCE ON ELECTRONICS AND INFORMATION ENGINEERING, 2015, 9794
[26] Aesthetic image captioning on the FAE-Captions dataset
Jin, Xin
Lv, Jianwen
Zhou, Xinghui
Xiao, Chaoen
Li, Xiaodong
Zhao, Shu
COMPUTERS & ELECTRICAL ENGINEERING, 2022, 101
[27] IncreasingWeb3D Accessibility with Audio Captioning
Polys, Nicholas F.
Wasi, Sheeban
28TH INTERNATIONAL CONFERENCE ON WEB3D TECHNOLOGY, WEB3D 2023, 2023,
[28] SEEING AND HEARING TOO: AUDIO REPRESENTATION FOR VIDEO CAPTIONING
Chuang, Shun-Po
Wan, Chia-Hung
Huang, Pang-Chi
Yang, Chi-Yu
Lee, Hung-Yi
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 381 - 388
[29] FeatureCut: An Adaptive Data Augmentation for Automated Audio Captioning
Ye, Zhongjie
Wang, Yuqing
Wang, Helin
Yang, Dongchao
Zou, Yuexian
PROCEEDINGS OF 2022 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2022, : 313 - 318
[30] A dental intraoral image dataset of gingivitis for image captioning
Duy, Hoang Bao
Hue, Tran Thi
Son, Tong Minh
Nghia, Le Long
Lan, Luong Thi Hong
Duc, Nguyen Minh
Son, Le Hoang
DATA IN BRIEF, 2024, 57

← 1 2 3 4 5 →