A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin

被引：0

作者：

Yu, Jun ^{[1
,2
,3
]}

Su, Rongfeng ^{[1
,2
]}

Wang, Lan ^{[1
,2
]}

Zhou, Wenpeng ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Key Lab Human Machine Intelligence Synergy Syst, Shenzhen, Peoples R China

[2] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China

[3] Lanzhou Univ, Sch Informat Sci & Engn, Lanzhou, Gansu, Peoples R China

来源：

2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年

基金：

中国国家自然科学基金;

关键词：

3D audio-visual; 3D facial motion; multi-channel; multi-speaker;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper presents a multi-channel/multi-speaker 3D audiovisual corpus for Mandarin continuous speech recognition and other fields, such as speech visualization and speech synthesis. This corpus consists of 24 speakers with about 18k utterances, about 20 hours in total. For each utterance, the audio streams were recorded by two professional microphones in near-field and far-field respectively, while a marker-based 3D facial motion capturing system with six infrared cameras was used to acquire the 3D video streams. Besides, the corresponding 2D video streams were captured by an additional camera as a supplement. A data process is described in this paper for synchronizing audio and video streams, detecting and correcting outliers, and removing head motions during recording. Finally, results about data process are also discussed. As so far, this corpus is the largest 3D audio-visual corpus for Mandarin.

引用

页数：5

共 50 条

[1] Multi-Speaker Audio-Visual Corpus RUSAVIC: Russian Audio-Visual Speech in Cars
Ivanko, Denis
Ryumin, Dmitry
Axyonov, Alexandr
Kashevnik, Alexey
Karpov, Alexey
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1555 - 1559
[2] A Multi-channel/Multi-speaker Articulatory Database in Mandarin for Speech Visualization
Zhang, Dan
Liu, Xianqian
Yan, Nan
Wang, Lan
Zhu, Yun
Chen, Hui
2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 299 - +
[3] Multi-channel multi-speaker transformer for speech recognition
Guo Yifan
Tian Yao
Suo Hongbin
Wan Yulong
INTERSPEECH 2023, 2023, : 4918 - 4922
[4] Audio-visual Multi-channel Recognition of Overlapped Speech
Yu, Jianwei
Wu, Bo
Gu, Rongzhi
Zhang, Shi-Xiong
Chen, Lianwu
Xu, Yong
Yu, Meng
Su, Dan
Yu, Dong
Liu, Xunying
Meng, Helen
INTERSPEECH 2020, 2020, : 3496 - 3500
[5] MULTI-CHANNEL MULTI-SPEAKER ASR USING 3D SPATIAL FEATURE
Shao, Yiwen
Zhang, Shi-Xiong
Yu, Dong
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6067 - 6071
[6] Integration of audio-visual information for multi-speaker multimedia speaker recognition
Yang, Jichen
Chen, Fangfan
Cheng, Yu
Lin, Pei
DIGITAL SIGNAL PROCESSING, 2024, 145
[7] Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech
Yu, Jianwei
Zhang, Shi-Xiong
Wu, Bo
Liu, Shansong
Hu, Shoukang
Geng, Mengzhe
Liu, Xunying
Meng, Helen
Yu, Dong
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2067 - 2082
[8] AUDIO-VISUAL MULTI-CHANNEL SPEECH SEPARATION, DEREVERBERATION AND RECOGNITION
Li, Guinan
Yu, Jianwei
Deng, Jiajun
Liu, Xunying
Meng, Helen
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6042 - 6046
[9] Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
Jiang, Hao
Murdock, Calvin
Ithapu, Vamsi Krishna
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10534 - 10542
[10] Audio-Visual Multi-Speaker Tracking Based On the GLMB Framework
Lin, Shoufeng
Qian, Xinyuan
INTERSPEECH 2020, 2020, : 3082 - 3086

← 1 2 3 4 5 →