Natural Language Processing Applications in Case-Law Text Publishing

被引:2
|
作者
Tarasconi, Francesco [1 ]
Botros, Milad [1 ]
Caserio, Matteo [1 ]
Sportelli, Gianpiero [1 ]
Giacalone, Giuseppe [2 ]
Uttini, Carlotta [2 ]
Vignati, Luca [2 ]
Zanetta, Fabrizio [2 ]
机构
[1] CELI Language Technol, Via San Quintino 31, I-10121 Turin, Italy
[2] Giuffre Francis Lefebvre, Milan, Italy
来源
关键词
natural language processing; applications; transfer learning; language models; text classification; information extraction; publishing industry; machine learning; BERT fine-tuning; random forest; Italian language;
D O I
10.3233/FAIA200859
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Processing case-law contents for electronic publishing purposes is a time-consuming activity that encompasses several sub-tasks and usually involves adding annotations to the original text. On the other hand, recent trends in Artificial Intelligence and Natural Language Processing enable the automatic and efficient analysis of big textual data. In this paper we present our Machine Learning solution to three specific business problems, regularly met by a real world Italian publisher in their day-to-day work: recognition of legal references in text spans, new content ranking by relevance, and text classification according to a given tree of topics. Different approaches based on BERT language model were experimented with, together with alternatives, typically based on Bag-of-Words. The optimal solution, deployed in a controlled production environment, was in two out of three cases based on fine-tuned BERT (for the extraction of legal references and text classification), while, in the case of relevance ranking, a Random Forest model, with hand-crafted features, was preferred. We will conclude by discussing the concrete impact, as perceived by the publisher, of the developed prototypes.
引用
收藏
页码:154 / 163
页数:10
相关论文
共 50 条
  • [1] UNLization of Punjabi text for natural language processing applications
    Vaibhav Agarwal
    Parteek Kumar
    Sādhanā, 2018, 43
  • [2] UNLization of Punjabi text for natural language processing applications
    Agarwal, Vaibhav
    Kumar, Parteek
    SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2018, 43 (06):
  • [3] Arabic text preprocessing for the natural language processing applications
    Awajan, Arafat
    ARAB GULF JOURNAL OF SCIENTIFIC RESEARCH, 2007, 25 (04): : 179 - 189
  • [4] Applications of Natural Language Processing to Geoscience Text Data and Prospectivity Modeling
    Lawley, Christopher J. M.
    Gadd, Michael G.
    Parsa, Mohammad
    Lederer, Graham W.
    Graham, Garth E.
    Ford, Arianne
    NATURAL RESOURCES RESEARCH, 2023, 32 (04) : 1503 - 1527
  • [5] Applications of Natural Language Processing to Geoscience Text Data and Prospectivity Modeling
    Christopher J. M. Lawley
    Michael G. Gadd
    Mohammad Parsa
    Graham W. Lederer
    Garth E. Graham
    Arianne Ford
    Natural Resources Research, 2023, 32 : 1503 - 1527
  • [6] Editorial: Emerging applications of text analytics and natural language processing in healthcare
    Hasikin, Khairunnisa
    Lai, Khin Wee
    Satapathy, Suresh Chandra
    Sabanci, Kadir
    Aslan, Muhammet Fatih
    FRONTIERS IN DIGITAL HEALTH, 2023, 5
  • [7] Neurolinguistic approach to natural language processing with applications to medical text analysis
    Duch, Wlodzisfaw
    Matykiewicz, Pawel
    Pestian, John
    NEURAL NETWORKS, 2008, 21 (10) : 1500 - 1510
  • [8] Predicting citations in Dutch case law with natural language processing
    Schepers, Iris
    Medvedeva, Masha
    Bruijn, Michelle
    Wieling, Martijn
    Vols, Michel
    ARTIFICIAL INTELLIGENCE AND LAW, 2024, 32 (03) : 807 - 837
  • [9] THE BASIS OF CASE-LAW
    Lefroy, A. H. F.
    LAW QUARTERLY REVIEW, 1906, 22 (88): : 416 - 430
  • [10] EXPROPRIATION IN CASE-LAW
    Brezanski, Jasna
    ZBORNIK PRAVNOG FAKULTETA SVEUCILISTA U RIJECI, 2007, 28 (01): : 783 - 817