CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding

被引:0
|
作者
Ma, Yixiao [1 ]
Wu, Yueyue [2 ,3 ,4 ]
Su, Weihang [2 ,3 ,4 ]
Ai, Qingyao [2 ,3 ,4 ]
Liu, Yiqun [2 ,3 ,4 ]
机构
[1] Huawei Cloud BU, Shenzhen, Guangdong, Peoples R China
[2] Quan Cheng Lab, Nanjing, Peoples R China
[3] Tsinghua Univ, Inst Internet Judiciary, Beijing, Peoples R China
[4] Tsinghua Univ, DCST, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Legal case retrieval is a critical process for modern legal information systems. While recent studies have utilized pre-trained language models (PLMs) based on the general domain self-supervised pre-training paradigm to build models for legal case retrieval, there are limitations in using general domain PLMs as backbones. Specifically, these models may not fully capture the underlying legal features in legal case documents. To address this issue, we propose CaseEncoder, a legal document encoder that leverages fine-grained legal knowledge in both the data sampling and pre-training phases. In the data sampling phase, we enhance the quality of the training data by utilizing fine-grained law article information to guide the selection of positive and negative examples. In the pre-training phase, we design legal-specific pre-training tasks that align with the judging criteria of relevant legal cases. Based on these tasks, we introduce an innovative loss function called Biased Circle Loss to enhance the model's ability to recognize case relevance in fine grains. Experimental results on multiple benchmarks demonstrate that CaseEncoder significantly outperforms both existing general pre-training models and legal-specific pre-training models in zero-shot legal case retrieval. The source code of CaseEncoder can be found at https://github.com/myx666/CaseEncoder.
引用
收藏
页码:7134 / 7143
页数:10
相关论文
共 50 条
  • [31] NMT Enhancement based on Knowledge Graph Mining with Pre-trained Language Model
    Yang, Hao
    Qin, Ying
    Deng, Yao
    Wang, Minghan
    2020 22ND INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY (ICACT): DIGITAL SECURITY GLOBAL AGENDA FOR SAFE SOCIETY!, 2020, : 185 - 189
  • [32] Vietnamese Sentence Paraphrase Identification using Pre-trained Model and Linguistic Knowledge
    Dien Dinh
    Nguyen Le Thanh
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2021, 12 (08) : 796 - 806
  • [33] Probing Pre-Trained Language Models for Disease Knowledge
    Alghanmi, Israa
    Espinosa-Anke, Luis
    Schockaert, Steven
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3023 - 3033
  • [34] Dynamic Knowledge Distillation for Pre-trained Language Models
    Li, Lei
    Lin, Yankai
    Ren, Shuhuai
    Li, Peng
    Zhou, Jie
    Sun, Xu
    2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 379 - 389
  • [35] Commonsense Knowledge Transfer for Pre-trained Language Models
    Zhou, Wangchunshu
    Le Bras, Ronan
    Choi, Yejin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5946 - 5960
  • [36] AP-BERT: enhanced pre-trained model through average pooling
    Zhao, Shuai
    Zhang, Tianyu
    Hu, Man
    Chang, Wen
    You, Fucheng
    APPLIED INTELLIGENCE, 2022, 52 (14) : 15929 - 15937
  • [37] Enhanced Pre-Trained Xception Model Transfer Learned for Breast Cancer Detection
    Joshi, Shubhangi A.
    Bongale, Anupkumar M.
    Olsson, P. Olof
    Urolagin, Siddhaling
    Dharrao, Deepak
    Bongale, Arunkumar
    COMPUTATION, 2023, 11 (03)
  • [38] AP-BERT: enhanced pre-trained model through average pooling
    Shuai Zhao
    Tianyu Zhang
    Man Hu
    Wen Chang
    Fucheng You
    Applied Intelligence, 2022, 52 : 15929 - 15937
  • [39] Vision Enhanced Generative Pre-trained Language Model for Multimodal Sentence Summarization
    Jing, Liqiang
    Li, Yiren
    Xu, Junhao
    Yu, Yongcan
    Shen, Pei
    Song, Xuemeng
    MACHINE INTELLIGENCE RESEARCH, 2023, 20 (02) : 289 - 298
  • [40] Exploring Transfer Learning for Enhanced Seed Classification: Pre-trained Xception Model
    Gulzar, Yonis
    Unal, Zeynep
    Ayoub, Shahnawaz
    Reegu, Faheem Ahmad
    15TH INTERNATIONAL CONGRESS ON AGRICULTURAL MECHANIZATION AND ENERGY IN AGRICULTURE, ANKAGENG 2023, 2024, 458 : 137 - 147