Longitudinal Speaker Clustering and Verification Corpus with Code-Switching Frisian-Dutch Speech

被引:9
|
作者
Yilmaz, Emre [1 ]
Dijkstra, Jelske [2 ]
Van de Velde, Hans [2 ]
Kampstra, Frederik [3 ]
Algra, Jouke [3 ]
van den Heuvel, Henk [1 ]
Van Leeuwen, David [1 ]
机构
[1] Radboud Univ Nijmegen, CLS CLST, Nijmegen, Netherlands
[2] Fryske Akad, Leeuwarden, Netherlands
[3] Omrop Fryslan, Leeuwarden, Netherlands
关键词
Speaker clustering; speaker diarization; speaker verification; ageing effects; bilingual data; RECOGNITION;
D O I
10.21437/Interspeech.2017-301
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we present a new longitudinal and bilingual broadcast database designed for speaker clustering and text independent verification research. The broadcast data is extracted from the archives of Omrop Fryslan which is the regional broadcaster in the province of Fryslan, located in the north of the Netherlands. Two speaker verification tasks are provided in a standard enrollment-test setting with language consistent trials. The first task contains target trials from all speakers available appearing in at least two different programs, while the second task contains target trials from a subgroup of speakers appearing in programs recorded in multiple years. The second task is designed to investigate the effects of ageing on the accuracy of speaker verification systems. This database also contains unlabeled spoken segments from different radio programs for speaker clustering research. We provide the output of an existing speaker diarization system for baseline verification experiments. Finally, we present the baseline speaker verification results using the Kaldi GMM- and DNN-UBM speaker verification system. This database will be an extension to the recently presented open source Frisian data collection and it is publicly available for research purposes.
引用
收藏
页码:37 / 41
页数:5
相关论文
共 50 条
  • [1] A Longitudinal Bilingual Frisian-Dutch Radio Broadcast Database Designed for Code-Switching Research
    Yilmaz, Emre
    Andringa, Maaike
    Kingma, Sigrid
    Dijkstra, Jelske
    van der Kuip, Frits
    Van de Velde, Hans
    Kampstra, Frederik
    Algra, Jouke
    van den Heuvel, Henk
    van Leeuwen, David
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4666 - 4669
  • [2] A code-switching asymmetry in bilingual children: Code-switching from Dutch to Frisian requires more cognitive control than code-switching from Frisian to Dutch
    Bosma, Evelyn
    Blom, Elma
    INTERNATIONAL JOURNAL OF BILINGUALISM, 2019, 23 (06) : 1431 - 1447
  • [3] The Boarnsterhim Corpus: A Bilingual Frisian-Dutch Panel and Trend Study
    Sloos, Marjoleine
    Drenth, Eduard
    Heeringa, Wilbert
    PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), 2018, : 1464 - 1467
  • [4] Investigating Bilingual Deep Neural Networks for Automatic Recognition of Code-switching Frisian Speech
    Yilmaz, Emre
    van den Heuvel, Henk
    van Leeuwen, David
    SLTU-2016 5TH WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGIES FOR UNDER-RESOURCED LANGUAGES, 2016, 81 : 159 - 166
  • [5] Code-switching in reported speech
    Leisiö, L
    SELECTED PAPERS FROM THE 6TH INTERNATIONAL PRAGMATICS CONFERENCE, VOL 2: PRAGMATICS IN 1998, 1999, : 349 - 362
  • [6] MIN-FRYSK - A STUDY OF THE ORIGINS OF TRANSFER AND CODE-SWITCHING IN SPOKEN FRISIAN - DUTCH - SJOLIN,B
    POSTHUMUS, J
    ZEITSCHRIFT FUR DIALEKTOLOGIE UND LINGUISTIK, 1984, (01): : 111 - 116
  • [7] CanVEC - the Canberra Vietnamese-English Code-switching Natural Speech Corpus
    Li Nguyen
    Bryant, Christopher
    PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), 2020, : 4121 - 4129
  • [8] Code-switching in Indic Speech Synthesisers
    Thomas, Anju Leela
    Prakash, Anusha
    Baby, Arun
    Murthy, Hema A.
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1948 - 1952
  • [9] A Hindi-English Code-Switching Corpus
    Dey, Anik
    Fung, Pascale
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014, : 2410 - 2413
  • [10] A Turkish-German Code-Switching Corpus
    Cetinoglu, Ozlem
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4215 - 4220