Short Texts Representations for Legal Domain Classification

被引：0

作者：

Zymkowski, Tomasz ^{[1
]}

Szymanski, Julian ^{[1
]}

Sobecki, Andrzej ^{[1
]}

Drozda, Pawel ^{[2
]}

Szalapak, Konrad ^{[3
]}

Komar-Komarowski, Kajetan ^{[3
]}

Scherer, Rafal ^{[4
]}

机构：

[1] Gdansk Univ Technol, Fac Elect Telecommun & Informat, Gdansk, Poland

[2] Univ Warmia & Mazury, Olsztyn, Poland

[3] Lex Secure 24H Opieka Prawna, Sopot, Poland

[4] Czestochowa Tech Univ, Dept Intelligent Comp Syst, Czestochowa, Poland

来源：

ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2022, PT I | 2023年 / 13588卷

关键词：

Text representation; Short text classification; Transformer; BERT;

D O I：

10.1007/978-3-031-23492-7_10

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work presents the results of comparison text representations used for short text classification with SVM and neural network when challenged with imbalanced data. We analyze both direct and indirect methods for selecting the proper category and improve them with various representation techniques. As a baseline, we set up a BOW method and then use more sophisticated approaches: word embeddings and transformer-based. The study were done on a dataset from a legal domain where the task was to select the topic of the discussion with the layer. The experiments indicate that fine-tuned pre-trained BERT model for this task gives the best results.

引用

页码：105 / 114

页数：10

共 50 条

[41] Orthographic features for emotion classification in Chinese in informal short texts
I-Hsuan Chen
Yunfei Long
Qin Lu
Chu-Ren Huang
Language Resources and Evaluation, 2021, 55 : 329 - 352
[42] Orthographic features for emotion classification in Chinese in informal short texts
Chen, I-Hsuan
Long, Yunfei
Lu, Qin
Huang, Chu-Ren
LANGUAGE RESOURCES AND EVALUATION, 2021, 55 (02) : 329 - 352
[43] The Classification of Short Scientific Texts Using Pretrained BERT Model
Danilov, Gleb
Ishankulov, Timur
Kotik, Konstantin
Orlov, Yuriy
Shifrin, Mikhail
Potapov, Alexander
PUBLIC HEALTH AND INFORMATICS, PROCEEDINGS OF MIE 2021, 2021, 281 : 83 - 87
[44] Frequent Use Cases Extraction from Legal Texts in the Data Protection Domain
Leone, Valentina
Di Caro, Luigi
LEGAL KNOWLEDGE AND INFORMATION SYSTEMS (JURIX 2019), 2019, 322 : 193 - 198
[45] Creation of a Legal Domain Corpus for the Belarusian Module in NooJ: Texts, Dictionaries, Grammars
Varanovich, Valery
Suprunchuk, Mikita
Zianouka, Yauheniya
Prakapenka, Tsimafei
Dolgova, Anna
Hetsevich, Yuras
FORMALIZING NATURAL LANGUAGES: APPLICATIONS TO NATURAL LANGUAGE PROCESSING AND DIGITAL HUMANITIES, NOOJ 2022, 2022, 1758 : 151 - 162
[46] Legal reelism: Movies as legal texts
Lucia, C
CINEASTE, 1999, 25 (01): : 14 - 18
[47] Unlocking Practical Applications in Legal Domain: Evaluation of GPT for Zero-Shot Semantic Annotation of Legal Texts
Savelka, Jaromir
PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND LAW, ICAIL 2023, 2023, : 447 - 451
[48] Combining Domain Knowledge Extraction With Graph Long Short-Term Memory for Learning Classification of Chinese Legal Documents
Li, Guodong
Wang, Zhe
Ma, Yinglong
IEEE ACCESS, 2019, 7 : 139616 - 139627
[49] Document representations for classification of short Web-page descriptions
Radovanovic, Milos
Ivanovic, Mirjana
DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2006, 4081 : 544 - 553
[50] A Self-enriching Methodology for Clustering Narrow Domain Short Texts
Pinto, David
Rosso, Paolo
Jimenez-Salazar, Hector
COMPUTER JOURNAL, 2011, 54 (07): : 1148 - 1165

← 1 2 3 4 5 →