Combining Self-supervised Learning and Active Learning for Disfluency Detection

被引:4
|
作者
Wang, Shaolei [1 ]
Wang, Zhongyuan [1 ]
Che, Wanxiang [1 ]
Zhao, Sendong [1 ]
Liu, Ting [1 ]
机构
[1] Harbin Inst Technol, 2 YiKuang St,Tech & Innovat Bldg,HIT Sci Pk, Harbin 150001, Heilongjiang, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Disfluency detection; self-supervised learning; active learning; pre-training technology;
D O I
10.1145/3487290
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spoken language is fundamentally different from the written language in that it contains frequent disfluencies or parts of an utterance that are corrected by the speaker. Disfluency detection (removing these disfluencies) is desirable to clean the input for use in downstream NLP tasks. Most existing approaches to disfluency detection heavily rely on human-annotated data, which is scarce and expensive to obtain in practice. To tackle the training data bottleneck, in this work, we investigate methods for combining self-supervised learning and active learning for disfluency detection. First, we construct large-scale pseudo training data by randomly adding or deleting words fromunlabeled data and propose two self-supervised pre-training tasks: (i) a tagging task to detect the added noisy words and (ii) sentence classification to distinguish original sentences from grammatically incorrect sentences. We then combine these two tasks to jointly pre-train a neural network. The pre-trained neural network is then fine-tuned using human-annotated disfluency detection training data. The self-supervised learning method can capture task-special knowledge for disfluency detection and achieve better performance when fine-tuning on a small annotated dataset compared to other supervised methods. However, limited in that the pseudo training data are generated based on simple heuristics and cannot fully cover all the disfluency patterns, there is still a performance gap compared to the supervised models trained on the full training dataset. We further explore how to bridge the performance gap by integrating active learning during the fine-tuning process. Active learning strives to reduce annotation costs by choosing the most critical examples to label and can address the weakness of self-supervised learning with a small annotated dataset. We show that by combining self-supervised learning with active learning, our model is able to match state-of-the-art performance with just about 10% of the original training data on both the commonly used English Switchboard test set and a set of in-house annotated Chinese data.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] Combining Self-Training and Self-Supervised Learning for Unsupervised Disfluency Detection
    Wang, Shaolei
    Wang, Zhongyuan
    Che, Wanxiang
    Liu, Ting
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 1813 - 1822
  • [2] Multi-Task Self-Supervised Learning for Disfluency Detection
    Wang, Shaolei
    Che, Wanxiang
    Liu, Qi
    Qin, Pengda
    Liu, Ting
    Wang, William Yang
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9193 - 9200
  • [3] Self-Supervised Reinforcement Learning for Active Object Detection
    Fang, Fen
    Liang, Wenyu
    Wu, Yan
    Xu, Qianli
    Lim, Joo-Hwee
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04): : 10224 - 10231
  • [4] COMBINING SELF-SUPERVISED AND SUPERVISED LEARNING WITH NOISY LABELS
    Zhang, Yongqi
    Zhang, Hui
    Yao, Quanming
    Wan, Jun
    2023 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2023, : 605 - 609
  • [5] Self-supervised learning for outlier detection
    Diers, Jan
    Pigorsch, Christian
    STAT, 2021, 10 (01):
  • [6] Deep active sampling with self-supervised learning
    Shi, Haochen
    Zhou, Hui
    FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (04)
  • [7] Deep active sampling with self-supervised learning
    Haochen SHI
    Hui ZHOU
    Frontiers of Computer Science, 2023, 17 (04) : 215 - 217
  • [8] SELF-SUPERVISED LEARNING FOR ANOMALOUS SOUND DETECTION
    Wilkinghoff, Kevin
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 276 - 280
  • [9] Anomaly Detection on Electroencephalography with Self-supervised Learning
    Xu, Junjie
    Zheng, Yaojia
    Mao, Yifan
    Wang, Ruixuan
    Zheng, Wei-Shi
    2020 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2020, : 363 - 368
  • [10] Gated Self-supervised Learning for Improving Supervised Learning
    Fuadi, Erland Hillman
    Ruslim, Aristo Renaldo
    Wardhana, Putu Wahyu Kusuma
    Yudistira, Novanto
    2024 IEEE CONFERENCE ON ARTIFICIAL INTELLIGENCE, CAI 2024, 2024, : 611 - 615