The Utility of Context When Extracting Entities From Legal Documents

被引:4
|
作者
Donnelly, Jonathan [1 ]
Roegiest, Adam [1 ]
机构
[1] Kira Syst, Toronto, ON, Canada
关键词
D O I
10.1145/3340531.3412746
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When reviewing documents for legal tasks such as Mergers and Acquisitions, granular information (such as start dates and exit clauses) need to be identified and extracted. Inspired by previous work in Named Entity Recognition (NER), we investigate how NER techniques can be leveraged to aid lawyers in this review process. Due to the extremely low prevalence of target information in legal documents, we find that the traditional approach of tagging all sentences in a document is inferior, in both effectiveness and data required to train and predict, to using a first-pass layer to identify sentences that are likely to contain the relevant information and then running the more traditional sentence-level sequence tagging. Moreover, we find that such entity-level models can be improved by training on a balanced sample of relevant and non-relevant sentences. We additionally describe the use of our system in production and how its usage by clients means that deep learning architectures tend to be cost inefficient, especially with respect to the necessary time to train models.
引用
收藏
页码:2397 / 2404
页数:8
相关论文
共 50 条
  • [41] Combined Classification for Extracting Named Entities from Arabic Texts
    Trabelsi, Feriel Ben Fraj
    Zribi, Chiraz Ben Othmane
    Kouki, Wiem
    2015 FIRST INTERNATIONAL CONFERENCE ON ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2015): ADVANCES IN ARABIC COMPUTATIONAL LINGUISTICS, 2015, : 55 - 60
  • [42] Extracting new Spatial Entities and Relations from Short Messages
    Zenasni, Sarah
    Kergosien, Eric
    Roche, Mathieu
    Teisseire, Maguelonne
    PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON MANAGEMENT OF DIGITAL ECOSYSTEMS (MEDES 2016), 2016, : 189 - 196
  • [43] Information Extraction from Legal Documents
    Cheng, Tin Tin
    Cua, Jeffrey Leonard
    Tan, Mark Davies
    Yao, Kenneth Gerard
    Roxas, Rachel Edita
    2009 EIGHTH INTERNATIONAL SYMPOSIUM ON NATURAL LANGUAGE PROCESSING, PROCEEDINGS, 2009, : 157 - +
  • [44] LEGAL DOCUMENTS FROM THE CAIRO GENIZAH
    GOLB, N
    JEWISH SOCIAL STUDIES, 1958, 20 (01) : 17 - 46
  • [45] Extracting context from environmental audio
    Clarkson, B
    Pentland, A
    SECOND INTERNATIONAL SYMPOSIUM ON WEARABLE COMPUTERS - DIGEST OF PAPERS, 1998, : 154 - 155
  • [46] Extracting Interlinear Glossed Text from LATEX Documents
    Schenner, Mathias
    Nordhoff, Sebastian
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 4044 - 4048
  • [47] A novel approach for extracting text from color documents
    Annamalai University, Annamalai Nagar, Tamil Nadu, India
    World Acad. Sci. Eng. Technol., 2009, (1147-1152):
  • [48] A linguistic and statistical approach for extracting knowledge from documents
    Sado, WN
    Fontaine, D
    Fontaine, P
    15TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATIONS, PROCEEDINGS, 2004, : 454 - 458
  • [49] Extracting Hyponymy of Ontology Concepts from Patent Documents
    Li, Junfeng
    Lv, Xueqiang
    Liu, Kehui
    2014 TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2014, : 283 - 287
  • [50] A METHOD FOR EXTRACTING WATERMARKS FROM TEXTURED PRINTED DOCUMENTS
    Sergeyev, V. V.
    Fedoseev, V. A.
    COMPUTER OPTICS, 2014, 38 (04) : 825 - 832