LEARNING DOMAIN INVARIANT REPRESENTATIONS FOR CHILD-ADULT CLASSIFICATION FROM SPEECH

被引：0

作者：

Lahiri, Rimita ^{[1
]}

Kumar, Manoj ^{[1
]}

Bishop, Somer ^{[2
]}

Narayanan, Shrikanth ^{[1
]}

机构：

[1] Univ Southern Calif, Signal Anal & Interpretat Lab, Los Angeles, CA 90007 USA

[2] Univ Calif San Francisco, Dept Psychiat, San Francisco, CA USA

来源：

2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年

关键词：

Child speech; domain adversarial learning; gradient reversal; autism spectrum disorder; UNITED-STATES; AUTISM; ADVERSARIAL;

D O I：

10.1109/icassp40776.2020.9054276

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Diagnostic procedures for ASD (autism spectrum disorder) involve semi-naturalistic interactions between the child and a clinician. Computational methods to analyze these sessions require an end-to-end speech and language processing pipeline that goes from raw audio to clinically-meaningful behavioral features. An important component of this pipeline is the ability to automatically detect who is speaking when i.e., perform child-adult speaker classification. This binary classification task is often confounded due to variability associated with the participants' speech and background conditions. Further, scarcity of training data often restricts direct application of conventional deep learning methods. In this work, we address two major sources of variability-age of the child and data source collection location-using domain adversarial learning which does not require labeled target domain data. We use two methods, generative adversarial training with inverted label loss and gradient reversal layer to learn speaker embeddings invariant to the above sources of variability, and analyze different conditions under which the proposed techniques improve over conventional learning methods. Using a large corpus of ADOS-2 (autism diagnostic observation schedule, 2nd edition) sessions, we demonstrate up to 13.45% and 6.44% relative improvements over conventional learning methods.

引用

页码：6749 / 6753

页数：5

共 50 条

[1] META-LEARNING FOR ROBUST CHILD-ADULT CLASSIFICATION FROM SPEECH
Koluguri, Nithin Rao
Kumar, Manoj
Kim, So Hyun
Lord, Catherine
Narayanan, Shrikanth
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8094 - 8098
[2] DEVELOPING NEURAL REPRESENTATIONS FOR ROBUST CHILD-ADULT DIARIZATION
Krishnamachari, Suchitra
Kumar, Manoj
Kim, So Hyun
Lord, Catherine
Narayanan, Shrikanth
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 590 - 597
[3] Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism
Lahiri, Rimita
Feng, Tiantian
Hebbar, Rajat
Lord, Catherine
Kim, So Hyun
Narayanan, Shrikanth
INTERSPEECH 2023, 2023, : 3557 - 3561
[4] Learning Domain-Invariant Representations from Text for Domain Generalization
Zhang, Huihuang
Hu, Haigen
Chen, Qi
Zhou, Qianwei
Jiang, Mingfeng
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII, 2024, 14432 : 118 - 129
[5] On Learning Invariant Representations for Domain Adaptation
Zhao, Han
des Combes, Remi Tachet
Zhang, Kun
Gordon, Geoffrey J.
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[6] Speech and language processing for assessing child-adult interaction based on diarization and location
Hansen, John H. L.
Najafian, Maryam
Lileikyte, Rasa
Irvin, Dwight
Rous, Beth
INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 697 - 709
[7] Learning Class-Aligned and Generalized Domain-Invariant Representations for Speech Emotion Recognition
Xiao, Yufeng
Zhao, Huan
Li, Tingting
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2020, 4 (04): : 480 - 489
[8] Learning Domain Invariant Word Representations for Parsing Domain Adaptation
Qiao, Xiuming
Zhang, Yue
Zhao, Tiejun
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 801 - 813
[9] LEARNING MODALITY-INVARIANT REPRESENTATIONS FOR SPEECH AND IMAGES
Leidal, Kenneth
Harwath, David
Glass, James
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 424 - 429
[10] End-to-end child-adult speech diarization in naturalistic conditions of preschool classrooms
Kothalkar, Prasanna V.
Irvin, Dwight
Buzhardt, Jay
Hansen, John H.
JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):

← 1 2 3 4 5 →