Testing acoustic voice quality classification across languages and speech styles

被引：1

作者：

Braun, Bettina ^{[1
]}

Dehe, Nicole ^{[1
]}

Einfeldt, Marieke ^{[1
]}

Wochner, Daniela ^{[1
]}

Zahner-Ritter, Katharina ^{[2
]}

机构：

[1] Univ Konstanz, Dept Linguist, Constance, Germany

[2] Univ Trier, Dept 2, Phonet, Trier, Germany

来源：

INTERSPEECH 2021 | 2021年

关键词：

voice quality; phonation type; acoustic measures; random forest; cross-linguistic generalization; infant-directed speech; German; Chinese; Icelandic; INFANT-DIRECTED SPEECH; PERCEPTION; EMOTION; BREATHY; FEMALE;

D O I：

10.21437/Interspeech.2021-315

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Many studies relate acoustic voice quality measures to perceptual classification. We extend this line of research by training a classifier on a balanced set of perceptually annotated voice quality categories with high inter-rater agreement, and test it on speech samples from a different language and on a different speech style. Annotations were done on continuous speech from different laboratory settings. In Experiment 1, we trained a random forest with Standard Chinese and German recordings labelled as modal, breathy, or glottalized. The model had an accuracy of 78.7% on unseen data from the same sample (most important variables were harmonics-to-noise ratio, cepstral-peak prominence, and H1-A2). This model was then used to classify data from a different language (Icelandic, Experiment 2) and to classify a different speech style (German infant-directed speech (IDS), Experiment 3). Cross-linguistic generalizability was high for Icelandic (78.6% accuracy), but lower for German IDS (71.7% accuracy). Accuracy of recordings of adult-directed speech from the same speakers as in Experiment 3 (77%, Experiment 4) suggests that it is the special speech style of IDS, rather than the recording setting that led to lower performance. Results are discussed in terms of efficiency of coding and generalizability across languages and speech styles.

引用

页码：3920 / 3924

页数：5

共 50 条

[31] A comparison of acoustic correlates of voice quality across different recording devices: a cautionary tale
Penney, Joshua
Gibson, Andy
Cox, Felicity
Proctor, Michael
Szakay, Anita
INTERSPEECH 2021, 2021, : 1389 - 1393
[32] ACOUSTIC PROPERTIES OF VOICE TIMBER TYPES AND THEIR INFLUENCE ON VOICE CLASSIFICATION
CLEVELAND, TF
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 1977, 61 (06): : 1622 - 1629
[33] SPEAKER AND LANGUAGE INDEPENDENT VOICE QUALITY CLASSIFICATION APPLIED TO UNLABELLED CORPORA OF EXPRESSIVE SPEECH
Kane, John
Scherer, Stefan
Aylett, Matthew
Morency, Louis-Philippe
Gobl, Christer
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7982 - 7986
[34] Venezuelan voice database for voice quality testing
Jimenez, Jesus J. G.
Diaz, Jose. A.
Pacheco, Jose
INGENIERIA UC, 2013, 20 (01): : 17 - 24
[35] Acoustic Correlates of Voice Quality Improvement by Voice Training
Aikawa, Kiyoaki
Uenuma, Junko
Akitake, Tomoko
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4, 2010, : 2886 - +
[36] Voice quality in telephone speech: Comparing acoustic measures between VoIP telephone and high-quality recordings
Xu, Chenzi
Wormald, Jessica
Foulkes, Paul
Harrison, Philip
Hughes, Vincent
Welch, Poppy
Kelly, Finnian
van de Vloed, David
INTERSPEECH 2024, 2024, : 1570 - 1574
[37] The effect of speech melody on voice quality
Swerts, M
Veldhuis, R
SPEECH COMMUNICATION, 2001, 33 (04) : 297 - 303
[38] The analysis of voice quality in speech processing
Keller, E
NONLINEAR SPEECH MODELING AND APPLICATIONS, 2005, 3445 : 54 - 73
[39] Acoustic Voice Quality Index - AVQI for brazilian portuguese speakers: analysis of different speech material
Englert, Marina
Lima, Livia
Constantini, Ana Carolina
Latoszek, Ben Barsties, V
Maryn, Youri
Behlau, Mara
CODAS, 2019, 31 (01):
[40] The emotional quality of speech in voice services
Maffiolo, V
Chateau, N
ERGONOMICS, 2003, 46 (13-14) : 1375 - 1385

← 1 2 3 4 5 →