LEARNING DOMAIN INVARIANT REPRESENTATIONS FOR CHILD-ADULT CLASSIFICATION FROM SPEECH

被引:0
|
作者
Lahiri, Rimita [1 ]
Kumar, Manoj [1 ]
Bishop, Somer [2 ]
Narayanan, Shrikanth [1 ]
机构
[1] Univ Southern Calif, Signal Anal & Interpretat Lab, Los Angeles, CA 90007 USA
[2] Univ Calif San Francisco, Dept Psychiat, San Francisco, CA USA
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
关键词
Child speech; domain adversarial learning; gradient reversal; autism spectrum disorder; UNITED-STATES; AUTISM; ADVERSARIAL;
D O I
10.1109/icassp40776.2020.9054276
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Diagnostic procedures for ASD (autism spectrum disorder) involve semi-naturalistic interactions between the child and a clinician. Computational methods to analyze these sessions require an end-to-end speech and language processing pipeline that goes from raw audio to clinically-meaningful behavioral features. An important component of this pipeline is the ability to automatically detect who is speaking when i.e., perform child-adult speaker classification. This binary classification task is often confounded due to variability associated with the participants' speech and background conditions. Further, scarcity of training data often restricts direct application of conventional deep learning methods. In this work, we address two major sources of variability-age of the child and data source collection location-using domain adversarial learning which does not require labeled target domain data. We use two methods, generative adversarial training with inverted label loss and gradient reversal layer to learn speaker embeddings invariant to the above sources of variability, and analyze different conditions under which the proposed techniques improve over conventional learning methods. Using a large corpus of ADOS-2 (autism diagnostic observation schedule, 2nd edition) sessions, we demonstrate up to 13.45% and 6.44% relative improvements over conventional learning methods.
引用
收藏
页码:6749 / 6753
页数:5
相关论文
共 50 条
  • [1] META-LEARNING FOR ROBUST CHILD-ADULT CLASSIFICATION FROM SPEECH
    Koluguri, Nithin Rao
    Kumar, Manoj
    Kim, So Hyun
    Lord, Catherine
    Narayanan, Shrikanth
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 8094 - 8098
  • [2] DEVELOPING NEURAL REPRESENTATIONS FOR ROBUST CHILD-ADULT DIARIZATION
    Krishnamachari, Suchitra
    Kumar, Manoj
    Kim, So Hyun
    Lord, Catherine
    Narayanan, Shrikanth
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 590 - 597
  • [3] Robust Self Supervised Speech Embeddings for Child-Adult Classification in Interactions involving Children with Autism
    Lahiri, Rimita
    Feng, Tiantian
    Hebbar, Rajat
    Lord, Catherine
    Kim, So Hyun
    Narayanan, Shrikanth
    INTERSPEECH 2023, 2023, : 3557 - 3561
  • [4] Learning Domain-Invariant Representations from Text for Domain Generalization
    Zhang, Huihuang
    Hu, Haigen
    Chen, Qi
    Zhou, Qianwei
    Jiang, Mingfeng
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VIII, 2024, 14432 : 118 - 129
  • [5] On Learning Invariant Representations for Domain Adaptation
    Zhao, Han
    des Combes, Remi Tachet
    Zhang, Kun
    Gordon, Geoffrey J.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [6] Speech and language processing for assessing child-adult interaction based on diarization and location
    Hansen, John H. L.
    Najafian, Maryam
    Lileikyte, Rasa
    Irvin, Dwight
    Rous, Beth
    INTERNATIONAL JOURNAL OF SPEECH TECHNOLOGY, 2019, 22 (03) : 697 - 709
  • [7] Learning Class-Aligned and Generalized Domain-Invariant Representations for Speech Emotion Recognition
    Xiao, Yufeng
    Zhao, Huan
    Li, Tingting
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2020, 4 (04): : 480 - 489
  • [8] Learning Domain Invariant Word Representations for Parsing Domain Adaptation
    Qiao, Xiuming
    Zhang, Yue
    Zhao, Tiejun
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING (NLPCC 2019), PT I, 2019, 11838 : 801 - 813
  • [9] LEARNING MODALITY-INVARIANT REPRESENTATIONS FOR SPEECH AND IMAGES
    Leidal, Kenneth
    Harwath, David
    Glass, James
    2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2017, : 424 - 429
  • [10] End-to-end child-adult speech diarization in naturalistic conditions of preschool classrooms
    Kothalkar, Prasanna V.
    Irvin, Dwight
    Buzhardt, Jay
    Hansen, John H.
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2023, 153 (03):