A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin

被引:0
|
作者
Yu, Jun [1 ,2 ,3 ]
Su, Rongfeng [1 ,2 ]
Wang, Lan [1 ,2 ]
Zhou, Wenpeng [1 ,2 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Key Lab Human Machine Intelligence Synergy Syst, Shenzhen, Peoples R China
[2] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
[3] Lanzhou Univ, Sch Informat Sci & Engn, Lanzhou, Gansu, Peoples R China
基金
中国国家自然科学基金;
关键词
3D audio-visual; 3D facial motion; multi-channel; multi-speaker;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents a multi-channel/multi-speaker 3D audiovisual corpus for Mandarin continuous speech recognition and other fields, such as speech visualization and speech synthesis. This corpus consists of 24 speakers with about 18k utterances, about 20 hours in total. For each utterance, the audio streams were recorded by two professional microphones in near-field and far-field respectively, while a marker-based 3D facial motion capturing system with six infrared cameras was used to acquire the 3D video streams. Besides, the corresponding 2D video streams were captured by an additional camera as a supplement. A data process is described in this paper for synchronizing audio and video streams, detecting and correcting outliers, and removing head motions during recording. Finally, results about data process are also discussed. As so far, this corpus is the largest 3D audio-visual corpus for Mandarin.
引用
收藏
页数:5
相关论文
共 50 条
  • [1] Multi-Speaker Audio-Visual Corpus RUSAVIC: Russian Audio-Visual Speech in Cars
    Ivanko, Denis
    Ryumin, Dmitry
    Axyonov, Alexandr
    Kashevnik, Alexey
    Karpov, Alexey
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 1555 - 1559
  • [2] A Multi-channel/Multi-speaker Articulatory Database in Mandarin for Speech Visualization
    Zhang, Dan
    Liu, Xianqian
    Yan, Nan
    Wang, Lan
    Zhu, Yun
    Chen, Hui
    2014 9TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2014, : 299 - +
  • [3] Multi-channel multi-speaker transformer for speech recognition
    Guo Yifan
    Tian Yao
    Suo Hongbin
    Wan Yulong
    INTERSPEECH 2023, 2023, : 4918 - 4922
  • [4] Audio-visual Multi-channel Recognition of Overlapped Speech
    Yu, Jianwei
    Wu, Bo
    Gu, Rongzhi
    Zhang, Shi-Xiong
    Chen, Lianwu
    Xu, Yong
    Yu, Meng
    Su, Dan
    Yu, Dong
    Liu, Xunying
    Meng, Helen
    INTERSPEECH 2020, 2020, : 3496 - 3500
  • [5] MULTI-CHANNEL MULTI-SPEAKER ASR USING 3D SPATIAL FEATURE
    Shao, Yiwen
    Zhang, Shi-Xiong
    Yu, Dong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6067 - 6071
  • [6] Integration of audio-visual information for multi-speaker multimedia speaker recognition
    Yang, Jichen
    Chen, Fangfan
    Cheng, Yu
    Lin, Pei
    DIGITAL SIGNAL PROCESSING, 2024, 145
  • [7] Audio-Visual Multi-Channel Integration and Recognition of Overlapped Speech
    Yu, Jianwei
    Zhang, Shi-Xiong
    Wu, Bo
    Liu, Shansong
    Hu, Shoukang
    Geng, Mengzhe
    Liu, Xunying
    Meng, Helen
    Yu, Dong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 2067 - 2082
  • [8] AUDIO-VISUAL MULTI-CHANNEL SPEECH SEPARATION, DEREVERBERATION AND RECOGNITION
    Li, Guinan
    Yu, Jianwei
    Deng, Jiajun
    Liu, Xunying
    Meng, Helen
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6042 - 6046
  • [9] Egocentric Deep Multi-Channel Audio-Visual Active Speaker Localization
    Jiang, Hao
    Murdock, Calvin
    Ithapu, Vamsi Krishna
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 10534 - 10542
  • [10] Audio-Visual Multi-Speaker Tracking Based On the GLMB Framework
    Lin, Shoufeng
    Qian, Xinyuan
    INTERSPEECH 2020, 2020, : 3082 - 3086