Using Pre-trained Language Model to Enhance Active Learning for Sentence Matching

被引:0
|
作者
Bai, Guirong [1 ,2 ]
He, Shizhu [1 ,2 ]
Liu, Kang [1 ,2 ]
Zhao, Jun [1 ,2 ]
机构
[1] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Sentence matching; active learning; pre-trained language model;
D O I
10.1145/3480937
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Active learning is an effective method to substantially alleviate the problem of expensive annotation cost for data-driven models. Recently, pre-trained language models have been demonstrated to be powerful for learning language representations. In this article, we demonstrate that the pre-trained language model can also utilize its learned textual characteristics to enrich criteria of active learning. Specifically, we provide extra textual criteria with the pre-trained language model to measure instances, including noise, coverage, and diversity. With these extra textual criteria, we can select more efficient instances for annotation and obtain better results. We conduct experiments on both English and Chinese sentence matching datasets. The experimental results show that the proposed active learning approach can be enhanced by the pre-trained language model and obtain better performance.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] A Sentence Quality Evaluation Framework for Machine Reading Comprehension Incorporating Pre-trained Language Model
    Meng, Fan-Jun
    He, Ji-Fei
    Xu, Xing-Jian
    Zhao, Ya-Juan
    Sun, Li-Jun
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT IV, 2023, 14089 : 443 - 455
  • [22] Adapting Generative Pre-trained Language Model for Open-domain Multimodal Sentence Summarization
    Lin, Dengtian
    Jing, Liqiang
    Song, Xuemeng
    Liu, Meng
    Sun, Teng
    Nie, Liqiang
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 195 - 204
  • [23] Efficient Federated Learning with Pre-Trained Large Language Model Using Several Adapter Mechanisms
    Kim, Gyunyeop
    Yoo, Joon
    Kang, Sangwoo
    MATHEMATICS, 2023, 11 (21)
  • [24] Automatic Fixation of Decompilation Quirks Using Pre-trained Language Model
    Kaichi, Ryunosuke
    Matsumoto, Shinsuke
    Kusumoto, Shinji
    PRODUCT-FOCUSED SOFTWARE PROCESS IMPROVEMENT, PROFES 2023, PT I, 2024, 14483 : 259 - 266
  • [25] Surgicberta: a pre-trained language model for procedural surgical language
    Bombieri, Marco
    Rospocher, Marco
    Ponzetto, Simone Paolo
    Fiorini, Paolo
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 18 (01) : 69 - 81
  • [26] Harnessing Pre-Trained Sentence Transformers for Offensive Language Detection in Indian Languages
    MKSSS Cummins College of Engineering for Women, Maharashtra, Pune, India
    不详
    不详
    CEUR Workshop Proc., (427-434):
  • [27] Grammatical Error Correction by Transferring Learning Based on Pre-Trained Language Model
    Han M.
    Wang Y.
    Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2022, 56 (11): : 1554 - 1560
  • [28] Pre-trained Language Model for Biomedical Question Answering
    Yoon, Wonjin
    Lee, Jinhyuk
    Kim, Donghyeon
    Jeong, Minbyul
    Kang, Jaewoo
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2019, PT II, 2020, 1168 : 727 - 740
  • [29] BERTweet: A pre-trained language model for English Tweets
    Dat Quoc Nguyen
    Thanh Vu
    Anh Tuan Nguyen
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING: SYSTEM DEMONSTRATIONS, 2020, : 9 - 14
  • [30] ViDeBERTa: A powerful pre-trained language model for Vietnamese
    Tran, Cong Dao
    Pham, Nhut Huy
    Nguyen, Anh
    Hy, Truong Son
    Vu, Tu
    17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023, 2023, : 1071 - 1078