CaseEncoder: A Knowledge-enhanced Pre-trained Model for Legal Case Encoding

被引:0
|
作者
Ma, Yixiao [1 ]
Wu, Yueyue [2 ,3 ,4 ]
Su, Weihang [2 ,3 ,4 ]
Ai, Qingyao [2 ,3 ,4 ]
Liu, Yiqun [2 ,3 ,4 ]
机构
[1] Huawei Cloud BU, Shenzhen, Guangdong, Peoples R China
[2] Quan Cheng Lab, Nanjing, Peoples R China
[3] Tsinghua Univ, Inst Internet Judiciary, Beijing, Peoples R China
[4] Tsinghua Univ, DCST, Beijing, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Legal case retrieval is a critical process for modern legal information systems. While recent studies have utilized pre-trained language models (PLMs) based on the general domain self-supervised pre-training paradigm to build models for legal case retrieval, there are limitations in using general domain PLMs as backbones. Specifically, these models may not fully capture the underlying legal features in legal case documents. To address this issue, we propose CaseEncoder, a legal document encoder that leverages fine-grained legal knowledge in both the data sampling and pre-training phases. In the data sampling phase, we enhance the quality of the training data by utilizing fine-grained law article information to guide the selection of positive and negative examples. In the pre-training phase, we design legal-specific pre-training tasks that align with the judging criteria of relevant legal cases. Based on these tasks, we introduce an innovative loss function called Biased Circle Loss to enhance the model's ability to recognize case relevance in fine grains. Experimental results on multiple benchmarks demonstrate that CaseEncoder significantly outperforms both existing general pre-training models and legal-specific pre-training models in zero-shot legal case retrieval. The source code of CaseEncoder can be found at https://github.com/myx666/CaseEncoder.
引用
收藏
页码:7134 / 7143
页数:10
相关论文
共 50 条
  • [1] DKPLM: Decomposable Knowledge-Enhanced Pre-trained Language Model for Natural Language Understanding
    Zhang, Taolin
    Wang, Chengyu
    Hu, Nan
    Qiu, Minghui
    Tang, Chengguang
    He, Xiaofeng
    Huang, Jun
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 11703 - 11711
  • [2] Does the Correctness of Factual Knowledge Matter for Factual Knowledge-Enhanced Pre-trained Language Models?
    Cao, Boxi
    Tang, Qiaoyu
    Lin, Hongyu
    Han, Xianpei
    Sun, Le
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 2327 - 2340
  • [3] Knowledge Enhanced Pre-trained Language Model for Product Summarization
    Yin, Wenbo
    Ren, Junxiang
    Wu, Yuejiao
    Song, Ruilin
    Liu, Lang
    Cheng, Zhen
    Wang, Sibo
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, NLPCC 2022, PT II, 2022, 13552 : 263 - 273
  • [4] A Survey of Knowledge Enhanced Pre-Trained Language Models
    Hu, Linmei
    Liu, Zeyi
    Zhao, Ziwang
    Hou, Lei
    Nie, Liqiang
    Li, Juanzi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (04) : 1413 - 1430
  • [5] A Pre-trained Knowledge Tracing Model with Limited Data
    Yue, Wenli
    Su, Wei
    Liu, Lei
    Cai, Chuan
    Yuan, Yongna
    Jia, Zhongfeng
    Liu, Jiamin
    Xie, Wenjian
    DATABASE AND EXPERT SYSTEMS APPLICATIONS, PT I, DEXA 2024, 2024, 14910 : 163 - 178
  • [6] BertHANK: hierarchical attention networks with enhanced knowledge and pre-trained model for answer selection
    Haitian Yang
    Xuan Zhao
    Yan Wang
    Degang Sun
    Wei Chen
    Weiqing Huang
    Knowledge and Information Systems, 2022, 64 : 2189 - 2213
  • [7] BertHANK: hierarchical attention networks with enhanced knowledge and pre-trained model for answer selection
    Yang, Haitian
    Zhao, Xuan
    Wang, Yan
    Sun, Degang
    Chen, Wei
    Huang, Weiqing
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (08) : 2189 - 2213
  • [8] SAILER: Structure-aware Pre-trained Language Model for Legal Case Retrieval
    Li, Haitao
    Ai, Qingyao
    Chen, Jia
    Dong, Qian
    Wu, Yueyue
    Liu, Yiqun
    Chen, Chong
    Tian, Qi
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 1035 - 1044
  • [9] Knowledge Grounded Pre-Trained Model For Dialogue Response Generation
    Wang, Yanmeng
    Rong, Wenge
    Zhang, Jianfei
    Ouyang, Yuanxin
    Xiong, Zhang
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [10] Lawformer: A pre-trained language model for Chinese legal long documents
    Xiao, Chaojun
    Hu, Xueyu
    Liu, Zhiyuan
    Tu, Cunchao
    Sun, Maosong
    AI OPEN, 2021, 2 : 79 - 84