ProsAudit, a prosodic benchmark for self-supervised speech models

被引:0
|
作者
de Seyssel, Maureen [1 ,2 ]
Lavechin, Marvin [1 ,6 ]
Titeux, Hadrien [1 ]
Thomas, Arthur [7 ]
Virlet, Gwendal [5 ,7 ]
Revilla, Andrea Santos [7 ]
Wisniewski, Guillaume [2 ]
Ludusan, Bogdan [3 ,4 ]
Dupoux, Emmanuel [1 ,6 ]
机构
[1] PSL Res Univ, Cognit Machine Learning, EHESS, ENS,CNRS,INRIA, Paris, France
[2] Univ Paris Cite, CNRS, Lab Linguist Formelle, Paris, France
[3] Bielefeld Univ, Fac Linguist & Literary Studies, Bielefeld, Germany
[4] Bielefeld Univ, CITEC, Bielefeld, Germany
[5] INRAE, Inst Agro, PEGASE, St Gilles, France
[6] Meta AI Res, Paris, France
[7] CoML, Paris, France
来源
关键词
prosody; speech representation; self-supervised learning; human evaluation; SPOKEN LANGUAGE;
D O I
10.21437/Interspeech.2023-438
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We present ProsAudit, a benchmark in English to assess structural prosodic knowledge in self-supervised learning (SSL) speech models. It consists of two subtasks, their corresponding metrics, and an evaluation dataset. In the protosyntax task, the model must correctly identify strong versus weak prosodic boundaries. In the lexical task, the model needs to correctly distinguish between pauses inserted between words and within words. We also provide human evaluation scores on this benchmark. We evaluated a series of SSL models and found that they were all able to perform above chance on both tasks, even when evaluated on an unseen language. However, non-native models performed significantly worse than native ones on the lexical task, highlighting the importance of lexical knowledge in this task. We also found a clear effect of size with models trained on more data performing better in the two subtasks.
引用
收藏
页码:2963 / 2967
页数:5
相关论文
共 50 条
  • [1] Scaling Effect of Self-Supervised Speech Models
    Pu, Jie
    Yang, Yuguang
    Li, Ruirui
    Elibol, Oguz
    Droppo, Jasha
    INTERSPEECH 2021, 2021, : 1084 - 1088
  • [2] ON COMPRESSING SEQUENCES FOR SELF-SUPERVISED SPEECH MODELS
    Meng, Yen
    Chen, Hsuan-Jui
    Shi, Jiatong
    Watanabe, Shinji
    Garcia, Paola
    Lee, Hung-yi
    Tang, Hao
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1128 - 1135
  • [3] A clinical benchmark of public self-supervised pathology foundation models
    Gabriele Campanella
    Shengjia Chen
    Manbir Singh
    Ruchika Verma
    Silke Muehlstedt
    Jennifer Zeng
    Aryeh Stock
    Matt Croken
    Brandon Veremis
    Abdulkadir Elmas
    Ivan Shujski
    Noora Neittaanmäki
    Kuan-lin Huang
    Ricky Kwan
    Jane Houldsworth
    Adam J. Schoenfeld
    Chad Vanderbilt
    Nature Communications, 16 (1)
  • [4] PHONEME SEGMENTATION USING SELF-SUPERVISED SPEECH MODELS
    Strgar, Luke
    Harwath, David
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 1067 - 1073
  • [5] On Combining Global and Localized Self-Supervised Models of Speech
    Dumpala, Sri Harsha
    Sastry, Chandramouli S.
    Uher, Rudolf
    Oore, Sageev
    INTERSPEECH 2022, 2022, : 3593 - 3597
  • [6] The Efficacy of Self-Supervised Speech Models as Audio Representations
    Wu, Tung-Yu
    Hsu, Tsu-Yuan
    Li, Chen-An
    Lin, Tzu-Han
    Lee, Hung-yi
    HEAR: HOLISTIC EVALUATION OF AUDIO REPRESENTATIONS, VOL 166, 2021, 166 : 90 - 110
  • [7] Benchmarking Children's ASR with Supervised and Self-supervised Speech Foundation Models
    Fan, Ruchao
    Shankar, Natarajan Balaji
    Alwani, Abeer
    INTERSPEECH 2024, 2024, : 5173 - 5177
  • [8] Word Discovery in Visually Grounded, Self-Supervised Speech Models
    Peng, Puyuan
    Harwath, David
    INTERSPEECH 2022, 2022, : 2823 - 2827
  • [9] Word Discovery in Visually Grounded, Self-Supervised Speech Models
    Department of Computer Science, The University of Texas, Austin, United States
    Proc. Annu. Conf. Int. Speech. Commun. Assoc., INTERSPEECH, (2823-2827):
  • [10] Membership Inference Attacks Against Self-supervised Speech Models
    Tseng, Wei-Cheng
    Kao, Wei-Tsung
    Lee, Hung-yi
    INTERSPEECH 2022, 2022, : 5040 - 5044