A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin

被引:0
|
作者
Yu, Jun [1 ,2 ,3 ]
Su, Rongfeng [1 ,2 ]
Wang, Lan [1 ,2 ]
Zhou, Wenpeng [1 ,2 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Key Lab Human Machine Intelligence Synergy Syst, Shenzhen, Peoples R China
[2] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
[3] Lanzhou Univ, Sch Informat Sci & Engn, Lanzhou, Gansu, Peoples R China
基金
中国国家自然科学基金;
关键词
3D audio-visual; 3D facial motion; multi-channel; multi-speaker;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents a multi-channel/multi-speaker 3D audiovisual corpus for Mandarin continuous speech recognition and other fields, such as speech visualization and speech synthesis. This corpus consists of 24 speakers with about 18k utterances, about 20 hours in total. For each utterance, the audio streams were recorded by two professional microphones in near-field and far-field respectively, while a marker-based 3D facial motion capturing system with six infrared cameras was used to acquire the 3D video streams. Besides, the corresponding 2D video streams were captured by an additional camera as a supplement. A data process is described in this paper for synchronizing audio and video streams, detecting and correcting outliers, and removing head motions during recording. Finally, results about data process are also discussed. As so far, this corpus is the largest 3D audio-visual corpus for Mandarin.
引用
收藏
页数:5
相关论文
共 50 条
  • [31] Audio Visual Multi-Speaker Tracking with Improved GCF and PMBM Filter
    Zhao, Jinzheng
    Wu, Peipei
    Liu, Xubo
    Goudarzi, Shidrokh
    Liu, Haohe
    Xu, Yong
    Wang, Wenwu
    INTERSPEECH 2022, 2022, : 3704 - 3708
  • [32] Single-speaker/multi-speaker co-channel speech classification
    Rossignol, Stephane
    Pietquini, Olivier
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2322 - 2325
  • [33] 3D Audio-Visual Speaker Tracking with A Novel Particle Filter
    Liu, Hong
    Sun, Yongheng
    Li, Yidi
    Yang, Bing
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 7343 - 7348
  • [34] 3D AUDIO-VISUAL SPEAKER TRACKING WITH AN ADAPTIVE PARTICLE FILTER
    Qian, Xinyuan
    Brutti, Alessio
    Omologo, Maurizio
    Cavallaro, Andrea
    2017 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2017, : 2896 - 2900
  • [35] Neural Speech Tracking Highlights the Importance of Visual Speech in Multi-speaker Situations
    Haider, Chandra L.
    Park, Hyojin
    Hauswald, Anne
    Weisz, Nathan
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2024, 36 (01) : 128 - 142
  • [36] A MULTI-VIEW APPROACH TO AUDIO-VISUAL SPEAKER VERIFICATION
    Sari, Leda
    Singh, Kritika
    Zhou, Jiatong
    Torresani, Lorenzo
    Singhal, Nayan
    Saraf, Yatharth
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6194 - 6198
  • [37] A CLOSER LOOK AT AUDIO-VISUAL MULTI-PERSON SPEECH RECOGNITION AND ACTIVE SPEAKER SELECTION
    Braga, Otavio
    Siohan, Olivier
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 6863 - 6867
  • [38] J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
    Takamichi, Shinnosuke
    Nakata, Wataru
    Tanji, Naoko
    Saruwatari, Hiroshi
    Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2022, 2022-September : 2358 - 2362
  • [39] J-MAC: Japanese multi-speaker audiobook corpus for speech synthesis
    Takamichi, Shinnosuke
    Nakata, Wataru
    Tanji, Naoko
    Saruwatari, Hiroshi
    INTERSPEECH 2022, 2022, : 2358 - 2362
  • [40] LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
    Koizumi, Yuma
    Zen, Heiga
    Karita, Shigeki
    Ding, Yifan
    Yatabe, Kohei
    Morioka, Nobuyuki
    Bacchiani, Michiel
    Zhang, Yu
    Han, Wei
    Bapna, Ankur
    INTERSPEECH 2023, 2023, : 5496 - 5500