A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin

被引：0

作者：

Yu, Jun ^{[1
,2
,3
]}

Su, Rongfeng ^{[1
,2
]}

Wang, Lan ^{[1
,2
]}

Zhou, Wenpeng ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Key Lab Human Machine Intelligence Synergy Syst, Shenzhen, Peoples R China

[2] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China

[3] Lanzhou Univ, Sch Informat Sci & Engn, Lanzhou, Gansu, Peoples R China

来源：

2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP) | 2016年

基金：

中国国家自然科学基金;

关键词：

3D audio-visual; 3D facial motion; multi-channel; multi-speaker;

D O I：

暂无

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper presents a multi-channel/multi-speaker 3D audiovisual corpus for Mandarin continuous speech recognition and other fields, such as speech visualization and speech synthesis. This corpus consists of 24 speakers with about 18k utterances, about 20 hours in total. For each utterance, the audio streams were recorded by two professional microphones in near-field and far-field respectively, while a marker-based 3D facial motion capturing system with six infrared cameras was used to acquire the 3D video streams. Besides, the corresponding 2D video streams were captured by an additional camera as a supplement. A data process is described in this paper for synchronizing audio and video streams, detecting and correcting outliers, and removing head motions during recording. Finally, results about data process are also discussed. As so far, this corpus is the largest 3D audio-visual corpus for Mandarin.

引用

页数：5

共 50 条

[31] Audio Visual Multi-Speaker Tracking with Improved GCF and PMBM Filter
Zhao, Jinzheng
Wu, Peipei
Liu, Xubo
Goudarzi, Shidrokh
Liu, Haohe
Xu, Yong
Wang, Wenwu
INTERSPEECH 2022, 2022, : 3704 - 3708
[32] Single-speaker/multi-speaker co-channel speech classification
Rossignol, Stephane
Pietquini, Olivier
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2322 - 2325
[33] 3D Audio-Visual Speaker Tracking with A Novel Particle Filter
Liu, Hong
Sun, Yongheng
Li, Yidi
Yang, Bing
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7343 - 7348
[34] 3D AUDIO-VISUAL SPEAKER TRACKING WITH AN ADAPTIVE PARTICLE FILTER
Qian, Xinyuan
Brutti, Alessio
Omologo, Maurizio
Cavallaro, Andrea
2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2896 - 2900
[35] Neural Speech Tracking Highlights the Importance of Visual Speech in Multi-speaker Situations
Haider, Chandra L.
Park, Hyojin
Hauswald, Anne
Weisz, Nathan
JOURNAL OF COGNITIVE NEUROSCIENCE, 2024, 36 (01) : 128 - 142
[36] A MULTI-VIEW APPROACH TO AUDIO-VISUAL SPEAKER VERIFICATION
Sari, Leda
Singh, Kritika
Zhou, Jiatong
Torresani, Lorenzo
Singhal, Nayan
Saraf, Yatharth
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6194 - 6198
[37] A CLOSER LOOK AT AUDIO-VISUAL MULTI-PERSON SPEECH RECOGNITION AND ACTIVE SPEAKER SELECTION
Braga, Otavio
Siohan, Olivier
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6863 - 6867
[38] J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
Takamichi, Shinnosuke
Nakata, Wataru
Tanji, Naoko
Saruwatari, Hiroshi
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022, 2022-September : 2358 - 2362
[39] J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
Takamichi, Shinnosuke
Nakata, Wataru
Tanji, Naoko
Saruwatari, Hiroshi
INTERSPEECH 2022, 2022, : 2358 - 2362
[40] LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Koizumi, Yuma
Zen, Heiga
Karita, Shigeki
Ding, Yifan
Yatabe, Kohei
Morioka, Nobuyuki
Bacchiani, Michiel
Zhang, Yu
Han, Wei
Bapna, Ankur
INTERSPEECH 2023, 2023, : 5496 - 5500

← 1 2 3 4 5 →