A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin

被引：0

作者：

Yu, Jun ^{[1
,2
,3
]}

Su, Rongfeng ^{[1
,2
]}

Wang, Lan ^{[1
,2
]}

Zhou, Wenpeng ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Key Lab Human Machine Intelligence Synergy Syst, Shenzhen, Peoples R China

[2] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China

[3] Lanzhou Univ, Sch Informat Sci & Engn, Lanzhou, Gansu, Peoples R China

来源：

2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年

基金：

中国国家自然科学基金;

关键词：

3D audio-visual; 3D facial motion; multi-channel; multi-speaker;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper presents a multi-channel/multi-speaker 3D audiovisual corpus for Mandarin continuous speech recognition and other fields, such as speech visualization and speech synthesis. This corpus consists of 24 speakers with about 18k utterances, about 20 hours in total. For each utterance, the audio streams were recorded by two professional microphones in near-field and far-field respectively, while a marker-based 3D facial motion capturing system with six infrared cameras was used to acquire the 3D video streams. Besides, the corresponding 2D video streams were captured by an additional camera as a supplement. A data process is described in this paper for synchronizing audio and video streams, detecting and correcting outliers, and removing head motions during recording. Finally, results about data process are also discussed. As so far, this corpus is the largest 3D audio-visual corpus for Mandarin.

引用

页数：5

共 50 条

[41] An Audio-visual 3D Virtual Articulation System for Visual Speech Synthesis
Li, Rui
Yu, Jun
2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON HAPTIC, AUDIO AND VISUAL ENVIRONMENTS AND GAMES (HAVE), 2017, : 25 - 30
[42] TAL: A SYNCHRONISED MULTI-SPEAKER CORPUS OF ULTRASOUND TONGUE IMAGING, AUDIO, AND LIP VIDEOS
Ribeiro, Manuel Sam
Sanger, Jennifer
Zhang, Jing-Xuan
Eshky, Aciel
Wrench, Alan
Richmond, Korin
Renals, Steve
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 1109 - 1116
[43] 3D SPATIAL FEATURES FOR MULTI-CHANNEL TARGET SPEECH SEPARATION
Gu, Rongzhi
Zhang, Shi-Xiong
Yu, Meng
Yu, Dong
2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 996 - 1002
[44] Multi-Channel Speaker Verification for Single and Multi-talker Speech
Kataria, Saurabh
Zhang, Shi-Xiong
Yu, Dong
INTERSPEECH 2021, 2021, : 4608 - 4612
[45] Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function
Wang, Qing
Chen, Hang
Jiang, Ya
Wang, Zhe
Wang, Yuyang
Du, Jun
Lee, Chin-Hui
2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 250 - 254
[46] NON-ZERO DIFFUSION PARTICLE FLOW SMC-PHD FILTER FOR AUDIO-VISUAL MULTI-SPEAKER TRACKING
Liu, Yang
Hilton, Adrian
Chambers, Jonathon
Zhao, Yuxin
Wang, Wenwu
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4304 - 4308
[47] Serial Transmission of Audio Signals for Multi-channel Speaker Systems
Kwon, Ohkyun
Song, Moonvin
Lee, Seungwon
Lee, Youngwon
Chung, Yunmo
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2005, 24 (07): : 387 - 394
[48] Multimodal (audio-visual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking
Naqvi, S. Mohsen
Wang, W.
Khan, M. Salman
Barnard, M.
Chambers, J. A.
IET SIGNAL PROCESSING, 2012, 6 (05) : 466 - 477
[49] Speaker-Independent Audio-Visual Speech Separation Based on Transformer in Multi-Talker Environments
Wang, Jing
Luo, Yiyu
Yi, Weiming
Xie, Xiang
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (04) : 766 - 777
[50] Building a Synchronous Corpus of Acoustic and 3D Facial Marker Data for Adaptive Audio-visual Speech Synthesis
Schabus, Dietmar
Pucher, Michael
Hofer, Gregor
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3313 - 3316

← 1 2 3 4 5 →