Discriminative Self-training for Punctuation Prediction

被引:3
|
作者
Chen, Qian [1 ]
Wang, Wen [1 ]
Chen, Mengzhe [1 ]
Zhang, Qinglin [1 ]
机构
[1] Alibaba Grp, Speech Lab, Hangzhou, Peoples R China
来源
关键词
punctuation prediction; self-training; label smoothing; Transformer; BERT;
D O I
10.21437/Interspeech.2021-246
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Punctuation prediction for automatic speech recognition (ASR) output transcripts plays a crucial role for improving the readability of the ASR transcripts and for improving the performance of downstream natural language processing applications. However, achieving good performance on punctuation prediction often requires large amounts of labeled speech transcripts, which is expensive and laborious. In this paper, we propose a Discriminative Self-Training approach with weighted loss and discriminative label smoothing to exploit unlabeled speech transcripts. Experimental results on the English IWSLT2011 benchmark test set and an internal Chinese spoken language dataset demonstrate that the proposed approach achieves significant improvement on punctuation prediction accuracy over strong baselines including BERT, RoBERTa, and ELECTRA models. The proposed Discriminative Self-Training approach outperforms the vanilla self-training approach. We establish a new state-of-the-art (SOTA) on the IWSLT2011 test set, outperforming the current SOTA model by 1.3% absolute gain on F-1.
引用
收藏
页码:771 / 775
页数:5
相关论文
共 50 条
  • [31] Adaptive Self-Training for Object Detection
    Vandeghen, Renaud
    Louppe, Gilles
    Van Droogenbroeck, Marc
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 914 - 923
  • [32] Self-training of Residents in the Specialization Process
    Gonzalez Mesa, Maria Isabel
    Zerquera Alvarez, Carlos Esteban
    Machin Asia, Annia
    MEDISUR-REVISTA DE CIENCIAS MEDICAS DE CIENFUEGOS, 2014, 12 (01): : 329 - 333
  • [33] Self-Training for Unsupervised Parsing with PRPN
    Mohananey, Anhad
    Kann, Katharina
    Bowman, Samuel R.
    16TH INTERNATIONAL CONFERENCE ON PARSING TECHNOLOGIES AND IWPT 2020 SHARED TASK ON PARSING INTO ENHANCED UNIVERSAL DEPENDENCIES, 2020, : 105 - 110
  • [34] Self-Training of ESD for Experienced Endoscopists
    Takahashi, Morio
    Katayama, Yasumi
    GASTROINTESTINAL ENDOSCOPY, 2012, 75 (04) : 373 - 373
  • [35] A Unified Contrastive Loss for Self-training
    Gauffre, Aurelien
    Horvat, Julien
    Amini, Massih-Reza
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES-RESEARCH TRACK AND DEMO TRACK, PT VIII, ECML PKDD 2024, 2024, 14948 : 3 - 18
  • [36] An approach to mobile robot self-training
    Golovko, V
    Ignatiuk, O
    Sauta, V
    PROCEEDINGS OF THE IEEE INTELLIGENT VEHICLES SYMPOSIUM 2000, 2000, : 608 - 613
  • [37] Self-Training with Selection-by-Rejection
    Zhou, Yan
    Kantarcioglu, Murat
    Thuraisingham, Bhavani
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 795 - 803
  • [38] PATTERN RECOGNITION IN SELF-TRAINING MODE
    LAPTEV, VA
    MILENKIY, AV
    ENGINEERING CYBERNETICS, 1966, (06): : 104 - &
  • [39] Cycle Self-Training for Domain Adaptation
    Liu, Hong
    Wang, Jianmin
    Long, Mingsheng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [40] An approach to self-training of the mobile robot
    Golovko, V
    Ignatiuk, O
    Sadykhov, R
    IDAACS'2001: PROCEEDINGS OF THE INTERNATIONAL WORKSHOP ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS: TECHNOLOGY AND APPLICATION, 2001, : 11 - 15