LEVERAGING EFFECTIVE LANGUAGE AND SPEAKER CONDITIONING IN INDIC TTS FOR LIMMITS 2024 CHALLENGE<bold> </bold>

被引:0
|
作者
Jeon, Yejin [1 ]
Kim, Young Jae [1 ]
Lee, Gary Geunbae [1 ,2 ]
机构
[1] POSTECH, Grad Sch AI, Pohang, South Korea
[2] POSTECH, Dept Comp Sci & Engn, Pohang, South Korea
关键词
Text-to-speech; multi-lingual; voice conversion; LIMMITS SPGC<bold>; </bold>;
D O I
10.1109/ICASSPW62465.2024.10627578
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we explain the model that was developed by the NLP POSTECH team for the LIMMITS 2024 Grand Challenge. Among the three tracks, we focus on Track 1, which necessitates the creation of a few-shot text-to-speech (TTS) system that generates natural speech across diverse languages. Towards this end, to realize multi-lingual capability, we incorporate a learnable language embedding. In addition, for precise imitation of target speaker voices, we leverage an inductive speaker bias conditioning methodology. Despite the simplicity of our strategy, our model is able to demonstrate remarkable efficacy in the generation of natural speech and preservation of high speaker fidelity for both mono and cross-lingual settings.<bold> </bold>
引用
收藏
页码:67 / 68
页数:2
相关论文
共 12 条
  • [1] LIMMITS'24: MULTI-SPEAKER, MULTI-LINGUAL INDIC TTS WITH VOICE CLONING<bold> </bold>
    Singh, Abhayjeet
    Nagireddi, Amala
    Deekshitha, G.
    Bandekar, Jesuraja
    Roopa, R.
    Badiger, Sandhya
    Udupa, Sathvik
    Ghosh, Prasanta Kumar
    Murthy, Hema A.
    Kumar, Pranaw
    Tokuda, Keiichi
    Hasegawa-Johnson, Mark
    Olbrich, Philipp
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 61 - 62
  • [2] LIMMITS'24: Multi-Speaker, Multi-Lingual INDIC TTS With Voice Cloning
    Udupa, Sathvik
    Bandekar, Jesuraja
    Singh, Abhayjeet
    Deekshitha, G.
    Kumar, Saurabh
    Badiger, Sandhya
    Nagireddi, Amala
    Roopa, R.
    Ghosh, Prasanta Kumar
    Murthy, Hema A.
    Kumar, Pranaw
    Tokuda, Keiichi
    Hasegawa-Johnson, Mark
    Olbrich, Philipp
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2025, 6 : 293 - 302
  • [3] THE USTC SYSTEM FOR CADENZA 2024 CHALLENGE<bold> </bold>
    Lan, Hongbo
    Cheng, Tianyou
    He, Maokui
    Chen, Hang
    Du, Jun
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 57 - 58
  • [4] THE THU-HCSI MULTI-SPEAKER MULTI-LINGUAL FEW-SHOT VOICE CLONING SYSTEM FOR LIMMITS'24 CHALLENGE<bold> </bold>
    Zhou, Yixuan
    Zhou, Shuoyi
    Lei, Shun
    Wu, Zhiyong
    Wu, Menglin
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 71 - 72
  • [5] THE ICASSP 2024 AUDIO DEEP PACKET LOSS CONCEALMENT GRAND CHALLENGE<bold> </bold>
    Diener, Lorenz
    Branets, Solomiya
    Saabas, Ando
    Cutler, Ross
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 41 - 42
  • [6] THE XMUSPEECH SYSTEM FOR AUDIO-VISUAL TARGET SPEAKER EXTRACTION IN MISP 2023 CHALLENGE<bold> </bold>
    Luo, Longjie
    Li, Tao
    Li, Lin
    Hong, Qingyang
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 39 - 40
  • [7] THE WHU WAKE WORD LIPREADING SYSTEM FOR THE 2024 CHAT-SCENARIO CHINESE LIPREADING CHALLENGE<bold> </bold>
    Wang, Haoxu
    Li, Cancan
    Su, Fei
    Liu, Juan
    Suo, Hongbin
    Li, Ming
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024, 2024,
  • [8] THE NERCSLIP-USTC SYSTEM FOR SEMI-SUPERVISED ACOUSTIC SCENE CLASSIFICATION OF ICME 2024 GRAND CHALLENGE<bold> </bold>
    Wang, Qing
    Zhong, Guirui
    Hong, Hengyi
    Wang, Lei
    Cai, Mingqi
    Fang, Xin
    Jiang, Ya
    Du, Jun
    2024 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO WORKSHOPS, ICMEW 2024, 2024,
  • [9] ICMC-ASR: THE ICASSP 2024 IN-CAR MULTI-CHANNEL AUTOMATIC SPEECH RECOGNITION CHALLENGE<bold> </bold>
    Wang, He
    Guo, Pengcheng
    Li, Yue
    Zhang, Ao
    Sun, Jiayao
    Xie, Lei
    Chen, Wei
    Zhou, Pan
    Bu, Hui
    Xu, Xin
    Zhang, Binbin
    Chen, Zhuo
    Wu, Jian
    Wang, Longbiao
    Chng, Eng Siong
    Li, Sun
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 63 - 64
  • [10] DETECTING GAMMA-BAND RESPONSES TO THE SPEECH ENVELOPE FOR THE ICASSP 2024 AUDITORY EEG DECODING SIGNAL PROCESSING GRAND CHALLENGE<bold> </bold>
    Thornton, Mike
    Auernheimer, Jonas
    Jehn, Constantin
    Mandic, Danilo
    Reichenbach, Tobias
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING WORKSHOPS, ICASSPW 2024, 2024, : 55 - 56