Short Texts Representations for Legal Domain Classification

被引:0
|
作者
Zymkowski, Tomasz [1 ]
Szymanski, Julian [1 ]
Sobecki, Andrzej [1 ]
Drozda, Pawel [2 ]
Szalapak, Konrad [3 ]
Komar-Komarowski, Kajetan [3 ]
Scherer, Rafal [4 ]
机构
[1] Gdansk Univ Technol, Fac Elect Telecommun & Informat, Gdansk, Poland
[2] Univ Warmia & Mazury, Olsztyn, Poland
[3] Lex Secure 24H Opieka Prawna, Sopot, Poland
[4] Czestochowa Tech Univ, Dept Intelligent Comp Syst, Czestochowa, Poland
来源
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2022, PT I | 2023年 / 13588卷
关键词
Text representation; Short text classification; Transformer; BERT;
D O I
10.1007/978-3-031-23492-7_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work presents the results of comparison text representations used for short text classification with SVM and neural network when challenged with imbalanced data. We analyze both direct and indirect methods for selecting the proper category and improve them with various representation techniques. As a baseline, we set up a BOW method and then use more sophisticated approaches: word embeddings and transformer-based. The study were done on a dataset from a legal domain where the task was to select the topic of the discussion with the layer. The experiments indicate that fine-tuned pre-trained BERT model for this task gives the best results.
引用
收藏
页码:105 / 114
页数:10
相关论文
共 50 条
  • [41] Orthographic features for emotion classification in Chinese in informal short texts
    I-Hsuan Chen
    Yunfei Long
    Qin Lu
    Chu-Ren Huang
    Language Resources and Evaluation, 2021, 55 : 329 - 352
  • [42] Orthographic features for emotion classification in Chinese in informal short texts
    Chen, I-Hsuan
    Long, Yunfei
    Lu, Qin
    Huang, Chu-Ren
    LANGUAGE RESOURCES AND EVALUATION, 2021, 55 (02) : 329 - 352
  • [43] The Classification of Short Scientific Texts Using Pretrained BERT Model
    Danilov, Gleb
    Ishankulov, Timur
    Kotik, Konstantin
    Orlov, Yuriy
    Shifrin, Mikhail
    Potapov, Alexander
    PUBLIC HEALTH AND INFORMATICS, PROCEEDINGS OF MIE 2021, 2021, 281 : 83 - 87
  • [44] Frequent Use Cases Extraction from Legal Texts in the Data Protection Domain
    Leone, Valentina
    Di Caro, Luigi
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS (JURIX 2019), 2019, 322 : 193 - 198
  • [45] Creation of a Legal Domain Corpus for the Belarusian Module in NooJ: Texts, Dictionaries, Grammars
    Varanovich, Valery
    Suprunchuk, Mikita
    Zianouka, Yauheniya
    Prakapenka, Tsimafei
    Dolgova, Anna
    Hetsevich, Yuras
    FORMALIZING NATURAL LANGUAGES: APPLICATIONS TO NATURAL LANGUAGE PROCESSING AND DIGITAL HUMANITIES, NOOJ 2022, 2022, 1758 : 151 - 162
  • [46] Legal reelism: Movies as legal texts
    Lucia, C
    CINEASTE, 1999, 25 (01): : 14 - 18
  • [47] Unlocking Practical Applications in Legal Domain: Evaluation of GPT for Zero-Shot Semantic Annotation of Legal Texts
    Savelka, Jaromir
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND LAW, ICAIL 2023, 2023, : 447 - 451
  • [48] Combining Domain Knowledge Extraction With Graph Long Short-Term Memory for Learning Classification of Chinese Legal Documents
    Li, Guodong
    Wang, Zhe
    Ma, Yinglong
    IEEE ACCESS, 2019, 7 : 139616 - 139627
  • [49] Document representations for classification of short Web-page descriptions
    Radovanovic, Milos
    Ivanovic, Mirjana
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 544 - 553
  • [50] A Self-enriching Methodology for Clustering Narrow Domain Short Texts
    Pinto, David
    Rosso, Paolo
    Jimenez-Salazar, Hector
    COMPUTER JOURNAL, 2011, 54 (07): : 1148 - 1165