Speaker-turn aware diarization for speech-based cognitive assessments

被引:0
|
作者
Xu, Sean Shensheng [1 ]
Ke, Xiaoquan [2 ]
Mak, Man-Wai [2 ]
Wong, Ka Ho [3 ]
Meng, Helen [4 ]
Kwok, Timothy C. Y. [5 ]
Gu, Jason [6 ]
Zhang, Jian [7 ]
Tao, Wei [8 ]
Chang, Chunqi [1 ]
机构
[1] Shenzhen Univ, Med Sch, Sch Biomed Engn, Shenzhen, Peoples R China
[2] Hong Kong Polytech Univ, Dept Elect & Informat Engn, Kowloon, Hong Kong, Peoples R China
[3] Chinese Univ Hong Kong, Dept Syst Engn & Engn Management, Shatin, Hong Kong, Peoples R China
[4] Chinese Univ Hong Kong, Dept Med & Therapeut, Shatin, Hong Kong, Peoples R China
[5] Chinese Univ Hong Kong, Jockey Club Ctr Osteoporosis Care & Control, Shatin, Hong Kong, Peoples R China
[6] Dalhousie Univ, Dept Elect & Comp Engn, Halifax, NS, Canada
[7] Shenzhen Univ Med Sch, Med Sch, Sch Pharm, Shenzhen, Peoples R China
[8] Shenzhen Univ, Dept Neurosurg, South China Hosp, Shenzhen, Peoples R China
基金
中国国家自然科学基金;
关键词
speaker diarization; speaker embedding; comprehensive scoring; speaker-turn timestamps; MOCA; dementia detection; MENTAL-STATE-EXAMINATION; IMPAIRMENT; DEMENTIA; MOCA;
D O I
10.3389/fnins.2023.1351848
中图分类号
Q189 [神经科学];
学科分类号
071006 ;
摘要
Introduction Speaker diarization is an essential preprocessing step for diagnosing cognitive impairments from speech-based Montreal cognitive assessments (MoCA).Methods This paper proposes three enhancements to the conventional speaker diarization methods for such assessments. The enhancements tackle the challenges of diarizing MoCA recordings on two fronts. First, multi-scale channel interdependence speaker embedding is used as the front-end speaker representation for overcoming the acoustic mismatch caused by far-field microphones. Specifically, a squeeze-and-excitation (SE) unit and channel-dependent attention are added to Res2Net blocks for multi-scale feature aggregation. Second, a sequence comparison approach with a holistic view of the whole conversation is applied to measure the similarity of short speech segments in the conversation, which results in a speaker-turn aware scoring matrix for the subsequent clustering step. Third, to further enhance the diarization performance, we propose incorporating a pairwise similarity measure so that the speaker-turn aware scoring matrix contains both local and global information across the segments.Results Evaluations on an interactive MoCA dataset show that the proposed enhancements lead to a diarization system that outperforms the conventional x-vector/PLDA systems under language-, age-, and microphone-mismatch scenarios.Discussion The results also show that the proposed enhancements can help hypothesize the speaker-turn timestamps, making the diarization method amendable to datasets without timestamp information.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Speaker Turn Aware Similarity Scoring for Diarization of Speech-Based Cognitive Assessments
    Xu, Sean Shensheng
    Mak, Man-Wai
    Wong, Ka Ho
    Meng, Helen
    Kwok, Timothy C. Y.
    2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 1299 - 1304
  • [2] End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation
    Zuluaga-Gomez, Juan
    Huang, Zhaocheng
    Niu, Xing
    Paturi, Rohit
    Srinavasan, Sundararajan
    Mathur, Prashant
    Thompson, Brian
    Federico, Marcello
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 7255 - 7274
  • [3] Neural speech turn segmentation and affinity propagation for speaker diarization
    Yin, Ruiqing
    Bredin, Herve
    Barras, Claude
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 1393 - 1397
  • [4] Age-Invariant Speaker Embedding for Diarization of Cognitive Assessments
    Xu, Sean Shensheng
    Mak, Man-Wai
    Wong, Ka Ho
    Meng, Helen
    Kwok, Timothy C. Y.
    2021 12TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2021,
  • [5] Speaker normalisation for speech-based emotion detection
    Sethu, Vidhyasaharan
    Ambikairajah, Eliathainby
    Epps, Julien
    PROCEEDINGS OF THE 2007 15TH INTERNATIONAL CONFERENCE ON DIGITAL SIGNAL PROCESSING, 2007, : 611 - +
  • [6] Avoiding dominance of speaker features in speech-based depression detection
    Zuo, Lishi
    Mak, Man-Wai
    PATTERN RECOGNITION LETTERS, 2023, 173 : 50 - 56
  • [7] Speech-Based Automated Cognitive Status Assessment
    Hakkani-Tuer, Dilek
    Vergyri, Dimitra
    Tur, Gokhan
    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2, 2010, : 258 - +
  • [8] Speech-based cognitive load monitoring system
    Yin, Bo
    Chen, Fang
    Ruiz, Natalie
    Ambikairajah, Eliathamby
    2008 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, VOLS 1-12, 2008, : 2041 - 2044
  • [9] Joint speaker diarization and speech recognition based on region proposal networks
    Huang, Zili
    Delcroix, Marc
    Garcia, Leibny Paola
    Watanabe, Shinji
    Raj, Desh
    Khudanpur, Sanjeev
    COMPUTER SPEECH AND LANGUAGE, 2022, 72
  • [10] Speaker Diarization of Overlapping Speech based on Silence Distribution in Meeting Recordings
    Yella, Harsha
    Valente, Fabio
    13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, 2012, : 490 - 493