Discriminative Self-training for Punctuation Prediction

被引：3

作者：

Chen, Qian ^{[1
]}

Wang, Wen ^{[1
]}

Chen, Mengzhe ^{[1
]}

Zhang, Qinglin ^{[1
]}

机构：

[1] Alibaba Grp, Speech Lab, Hangzhou, Peoples R China

来源：

INTERSPEECH 2021 | 2021年

关键词：

punctuation prediction; self-training; label smoothing; Transformer; BERT;

D O I：

10.21437/Interspeech.2021-246

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

Punctuation prediction for automatic speech recognition (ASR) output transcripts plays a crucial role for improving the readability of the ASR transcripts and for improving the performance of downstream natural language processing applications. However, achieving good performance on punctuation prediction often requires large amounts of labeled speech transcripts, which is expensive and laborious. In this paper, we propose a Discriminative Self-Training approach with weighted loss and discriminative label smoothing to exploit unlabeled speech transcripts. Experimental results on the English IWSLT2011 benchmark test set and an internal Chinese spoken language dataset demonstrate that the proposed approach achieves significant improvement on punctuation prediction accuracy over strong baselines including BERT, RoBERTa, and ELECTRA models. The proposed Discriminative Self-Training approach outperforms the vanilla self-training approach. We establish a new state-of-the-art (SOTA) on the IWSLT2011 test set, outperforming the current SOTA model by 1.3% absolute gain on F-1.

引用

页码：771 / 775

页数：5

共 50 条

[21] An Evaluation of Self-training Styles for Domain Adaptation on the Task of Splice Site Prediction
Herndon, Nic
Caragea, Doina
PROCEEDINGS OF THE 2015 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM 2015), 2015, : 1042 - 1047
[22] Low-Resource Mandarin Prosodic Structure Prediction Using Self-Training
Wang, Xingrui
Zhang, Bowen
Shinozaki, Takahiro
2021 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA ASC), 2021, : 859 - 863
[23] SELF-BLM: Prediction of drug-target interactions via self-training SVM
Keum, Jongsoo
Nam, Hojung
PLOS ONE, 2017, 12 (02):
[24] Self-Training System of Calligraphy Brushwork
Morikawa, Ami
Tsuda, Naoaki
Nomura, Yoshihiko
Kato, Norihiko
COMPANION OF THE 2017 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI'17), 2017, : 215 - 216
[25] Adversarial self-training for robustness and generalization
Li, Zhuorong
Wu, Minghui
Jin, Canghong
Yu, Daiwei
Yu, Hongchuan
PATTERN RECOGNITION LETTERS, 2024, 185 : 117 - 123
[26] Unsupervised Controllable Generation with Self-Training
Chrysos, Grigorios G.
Kossaifi, Jean
Yu, Zhiding
Anandkumar, Anima
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[27] Self-training for Cell Segmentation and Counting
Luo, J.
Oore, S.
Hollensen, P.
Fine, A.
Trappenberg, T.
ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11489 : 406 - 412
[28] CONSIDERATIONS ON SELF-TRAINING IN THE INNOVATION UNION
Blaga, Petruta
Tripon, Avram
STUDIES ON LITERATURE, DISCOURSE AND MULTICULTURAL DIALOGUE: COMMUNICATION AND PUBLIC RELATIONS, 2013, : 56 - 61
[29] Reranking and Self-Training for Parser Adaptation
McClosky, David
Charniak, Eugene
Johnson, Mark
COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, 2006, : 337 - 344
[30] Crafting networks: A self-training intervention
Wang, Huatian
Demerouti, Evangelia
Rispens, Sonja
van Gool, Piet
JOURNAL OF VOCATIONAL BEHAVIOR, 2024, 149

← 1 2 3 4 5 →