UTTERANCE-LEVEL END-TO-END LANGUAGE IDENTIFICATION USING ATTENTION-BASED CNN-BLSTM

被引:0
|
作者
Cai, Weicheng [1 ,2 ]
Cai, Danwei [1 ]
Huang, Shen [3 ]
Li, Ming [1 ]
机构
[1] Duke Kunshan Univ, Data Sci Res Ctr, Kunshan, Peoples R China
[2] Sun Yat Sen Univ, Sch Elect & Informat Technol, Guangzhou, Guangdong, Peoples R China
[3] Tencent Res, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Language identification; utterance-level; end-to-end; attention; CNN-BLSTM; SPEAKER; MACHINES;
D O I
10.1109/icassp.2019.8682386
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we present an end-to-end language identification framework, the attention-based Convolutional Neural Network-Bidirectional Long-short Term Memory ( CNN-BLSTM). The model is performed on the utterance level, which means the utterance-level decision can be directly obtained from the output of the neural network. To handle speech utterances with entire arbitrary and potentially long duration, we combine CNN-BLSTM model with a self-attentive pooling layer together. The front-end CNN-BLSTM module plays a role as local pattern extractor for the variable-length inputs, and the following self-attentive pooling layer is built on top to get the fixed-dimensional utterance-level representation. We conducted experiments on NIST LRE07 closed-set task, and the results reveal that the proposed attention-based CNN-BLSTM model achieves comparable error reduction with other state-of-the-art utterance-level neural network approaches for all 3 seconds, 10 seconds, 30 seconds duration tasks.
引用
收藏
页码:5991 / 5995
页数:5
相关论文
共 50 条
  • [1] End-to-end Language Identification using Attention-based Recurrent Neural Networks
    Geng, Wang
    Wang, Wenfu
    Zhao, Yuanyuan
    Cai, Xinyuan
    Xu, Bo
    17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 2944 - 2948
  • [2] EXPLORING END-TO-END ATTENTION-BASED NEURAL NETWORKS FOR NATIVE LANGUAGE IDENTIFICATION
    Ubale, Rutuja
    Qian, Yao
    Evanini, Keelan
    2018 IEEE WORKSHOP ON SPOKEN LANGUAGE TECHNOLOGY (SLT 2018), 2018, : 84 - 91
  • [3] CNN-Based End-To-End Language Identification
    Wang, Yutian
    Zhou, Huan
    Wang, Zheng
    Wang, Jingling
    Wang, Hui
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 2475 - 2479
  • [4] UTTERANCE-LEVEL NEURAL CONFIDENCE MEASURE FOR END-TO-END CHILDREN SPEECH RECOGNITION
    Liu, Wei
    Lee, Tan
    2021 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU), 2021, : 449 - 456
  • [5] Assessing Knee OA Severity with CNN attention-based end-to-end architectures
    Gorriz, Marc
    Antony, Joseph
    McGuinness, Kevin
    Giro-i-Nieto, Xavier
    O'Connor, Noel E.
    INTERNATIONAL CONFERENCE ON MEDICAL IMAGING WITH DEEP LEARNING, VOL 102, 2019, 102 : 197 - 214
  • [6] Conversational Analysis using Utterance-level Attention-based Bidirectional Recurrent Neural Networks
    Bothe, Chandrakant
    Magg, Sven
    Weber, Cornelius
    Wermter, Stefan
    19TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2018), VOLS 1-6: SPEECH RESEARCH FOR EMERGING MARKETS IN MULTILINGUAL SOCIETIES, 2018, : 996 - 1000
  • [7] Temporal feature extraction based on CNN-BLSTM and temporal pooling for language identification
    Liu, Xiuyan
    Chen, Chen
    He, Yongjun
    APPLIED ACOUSTICS, 2022, 195
  • [8] Attention-based end-to-end image defogging network
    Yang, Yan
    Zhang, Chen
    Jiang, Peipei
    Yue, Hui
    ELECTRONICS LETTERS, 2020, 56 (15) : 759 - +
  • [9] End-to-End Language Identification Using High-Order Utterance Representation with Bilinear Pooling
    Jin, Ma
    Song, Yan
    McLoughlin, Ian
    Guo, Wu
    Dai, Li-Rong
    18TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2017), VOLS 1-6: SITUATED INTERACTION, 2017, : 2571 - 2575
  • [10] Attention-based end-to-end CNN framework for content-based X-ray image retrieval
    Ozturk, Saban
    Alhudhaif, Adi
    Polat, Kemal
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2021, 29 : 2680 - 2693