LIMMITS'24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning

被引:0
|
作者
Udupa, Sathvik [1 ]
Bandekar, Jesuraja [1 ]
Singh, Abhayjeet [1 ]
Deekshitha, G. [1 ]
Kumar, Saurabh [1 ]
Badiger, Sandhya [1 ]
Nagireddi, Amala [1 ]
Roopa, R. [1 ]
Ghosh, Prasanta Kumar [1 ]
Murthy, Hema A. [2 ]
Kumar, Pranaw [3 ]
Tokuda, Keiichi [4 ]
Hasegawa-Johnson, Mark [5 ]
Olbrich, Philipp [6 ]
机构
[1] Indian Inst Sci IISc, Elect Engn Dept, Bangalore 560012, India
[2] Indian Inst Technol, Dept Comp Sci & Engn, Chennai 600036, India
[3] CDAC, Mumbai 400049, India
[4] Nagoya Inst Technol, Dept Comp Sci, Nagoya 4668555, Japan
[5] Univ Illinois, Dept Elect & Comp Engn, Champaign, IL 61820 USA
[6] Deutsch Gesell Internatl Zusammenarbeit GIZ GmbH, D-53113 Bonn, Germany
关键词
Cloning; Multilingual; Signal processing; Training; Text to speech; Noise measurement; Vocabulary; Solid modeling; Manuals; Encoding; Speech synthesis; multi-speaker; multi-lingual TTS; voice cloning; cross-lingual synthesis; SPEECH SYNTHESIS;
D O I
10.1109/OJSP.2025.3531782
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The Multi-speaker, Multi-lingual Indic Text to Speech (TTS) with voice cloning (LIMMITS'24) challenge is organized as part of the ICASSP 2024 signal processing grand challenge. LIMMITS'24 aims at the development of voice cloning for the multi-speaker, multi-lingual Text-to-Speech (TTS) model. Towards this, 80 hours of TTS data has been released in each of Bengali, Chhattisgarhi, English (Indian), and Kannada languages. This is in addition to Telugu, Hindi, and Marathi data released during the LIMMITS'23 challenge. The challenge encourages the advancement of TTS in Indian Languages as well as the development of multi-speaker voice cloning techniques for TTS. The three tracks of LIMMITS'24 have provided an opportunity for various researchers and practitioners around the world to explore the state of the art in research for voice cloning with TTS.
引用
收藏
页码:293 / 302
页数:10
相关论文
共 50 条
  • [41] The translation of multi-lingual cultures
    Shread, Carolyn
    TRANSLATION STUDIES, 2013, 6 (01) : 128 - 131
  • [42] The paradigm for creating multi-lingual text-to-speech voice databases
    Chu, Min
    Zhao, Yong
    Chen, Yining
    Wang, Lijuan
    Soong, Frank
    CHINESE SPOKEN LANGUAGE PROCESSING, PROCEEDINGS, 2006, 4274 : 736 - +
  • [43] Deep Voice 2: Multi-Speaker Neural Text-to-Speech
    Arik, Sercan O.
    Diamos, Gregory
    Gibiansky, Andrew
    Miller, John
    Peng, Kainan
    Ping, Wei
    Raiman, Jonathan
    Zhou, Yanqi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [44] A survey of mono- and multi-lingual character recognition using deep and shallow architectures: indic and non-indic scripts
    Kaur, Sukhandeep
    Bawa, Seema
    Kumar, Ravinder
    ARTIFICIAL INTELLIGENCE REVIEW, 2020, 53 (03) : 1813 - 1872
  • [45] A survey of mono- and multi-lingual character recognition using deep and shallow architectures: indic and non-indic scripts
    Sukhandeep Kaur
    Seema Bawa
    Ravinder Kumar
    Artificial Intelligence Review, 2020, 53 : 1813 - 1872
  • [47] Cross-lingual, Multi-speaker Text-To-Speech Synthesis Using Neural Speaker Embedding
    Chen, Mengnan
    Chen, Minchuan
    Liang, Shuang
    Ma, Jun
    Chen, Lei
    Wang, Shaojun
    Xiao, Jing
    INTERSPEECH 2019, 2019, : 2105 - 2109
  • [48] Cross-lingual multi-speaker speech synthesis with limited bilingual training data
    Cai, Zexin
    Yang, Yaogen
    Li, Ming
    COMPUTER SPEECH AND LANGUAGE, 2023, 77
  • [49] Single and Multi-Speaker Cloned Voice Detection: From Perceptual to Learned Features
    Barrington, Sarah
    Barua, Romit
    Koorma, Gautham
    Farid, Hany
    2023 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY, WIFS, 2023,
  • [50] MULTI-SPEAKER AND MULTI-DOMAIN EMOTIONAL VOICE CONVERSION USING FACTORIZED HIERARCHICAL VARIATIONAL AUTOENCODER
    Elgaar, Mohamed
    Park, Jungbae
    Lee, Sang Wan
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7769 - 7773