A multi-channel/multi-speaker interactive 3D Audio-Visual Speech Corpus in Mandarin

被引:0
|
作者
Yu, Jun [1 ,2 ,3 ]
Su, Rongfeng [1 ,2 ]
Wang, Lan [1 ,2 ]
Zhou, Wenpeng [1 ,2 ]
机构
[1] Chinese Acad Sci, Shenzhen Inst Adv Technol, Key Lab Human Machine Intelligence Synergy Syst, Shenzhen, Peoples R China
[2] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
[3] Lanzhou Univ, Sch Informat Sci & Engn, Lanzhou, Gansu, Peoples R China
基金
中国国家自然科学基金;
关键词
3D audio-visual; 3D facial motion; multi-channel; multi-speaker;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents a multi-channel/multi-speaker 3D audiovisual corpus for Mandarin continuous speech recognition and other fields, such as speech visualization and speech synthesis. This corpus consists of 24 speakers with about 18k utterances, about 20 hours in total. For each utterance, the audio streams were recorded by two professional microphones in near-field and far-field respectively, while a marker-based 3D facial motion capturing system with six infrared cameras was used to acquire the 3D video streams. Besides, the corresponding 2D video streams were captured by an additional camera as a supplement. A data process is described in this paper for synchronizing audio and video streams, detecting and correcting outliers, and removing head motions during recording. Finally, results about data process are also discussed. As so far, this corpus is the largest 3D audio-visual corpus for Mandarin.
引用
收藏
页数:5
相关论文
共 50 条
  • [41] An Audio-visual 3D Virtual Articulation System for Visual Speech Synthesis
    Li, Rui
    Yu, Jun
    2017 15TH IEEE INTERNATIONAL SYMPOSIUM ON HAPTIC, AUDIO AND VISUAL ENVIRONMENTS AND GAMES (HAVE), 2017, : 25 - 30
  • [42] TAL: A SYNCHRONISED MULTI-SPEAKER CORPUS OF ULTRASOUND TONGUE IMAGING, AUDIO, AND LIP VIDEOS
    Ribeiro, Manuel Sam
    Sanger, Jennifer
    Zhang, Jing-Xuan
    Eshky, Aciel
    Wrench, Alan
    Richmond, Korin
    Renals, Steve
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 1109 - 1116
  • [43] 3D SPATIAL FEATURES FOR MULTI-CHANNEL TARGET SPEECH SEPARATION
    Gu, Rongzhi
    Zhang, Shi-Xiong
    Yu, Meng
    Yu, Dong
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 996 - 1002
  • [44] Multi-Channel Speaker Verification for Single and Multi-talker Speech
    Kataria, Saurabh
    Zhang, Shi-Xiong
    Yu, Dong
    INTERSPEECH 2021, 2021, : 4608 - 4612
  • [45] Deep Learning Based Audio-Visual Multi-Speaker DOA Estimation Using Permutation-Free Loss Function
    Wang, Qing
    Chen, Hang
    Jiang, Ya
    Wang, Zhe
    Wang, Yuyang
    Du, Jun
    Lee, Chin-Hui
    2022 13TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2022, : 250 - 254
  • [46] NON-ZERO DIFFUSION PARTICLE FLOW SMC-PHD FILTER FOR AUDIO-VISUAL MULTI-SPEAKER TRACKING
    Liu, Yang
    Hilton, Adrian
    Chambers, Jonathon
    Zhao, Yuxin
    Wang, Wenwu
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 4304 - 4308
  • [47] Serial Transmission of Audio Signals for Multi-channel Speaker Systems
    Kwon, Ohkyun
    Song, Moonvin
    Lee, Seungwon
    Lee, Youngwon
    Chung, Yunmo
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2005, 24 (07): : 387 - 394
  • [48] Multimodal (audio-visual) source separation exploiting multi-speaker tracking, robust beamforming and time-frequency masking
    Naqvi, S. Mohsen
    Wang, W.
    Khan, M. Salman
    Barnard, M.
    Chambers, J. A.
    IET SIGNAL PROCESSING, 2012, 6 (05) : 466 - 477
  • [49] Speaker-Independent Audio-Visual Speech Separation Based on Transformer in Multi-Talker Environments
    Wang, Jing
    Luo, Yiyu
    Yi, Weiming
    Xie, Xiang
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (04) : 766 - 777
  • [50] Building a Synchronous Corpus of Acoustic and 3D Facial Marker Data for Adaptive Audio-visual Speech Synthesis
    Schabus, Dietmar
    Pucher, Michael
    Hofer, Gregor
    LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2012, : 3313 - 3316