Short Texts Representations for Legal Domain Classification

被引:0
|
作者
Zymkowski, Tomasz [1 ]
Szymanski, Julian [1 ]
Sobecki, Andrzej [1 ]
Drozda, Pawel [2 ]
Szalapak, Konrad [3 ]
Komar-Komarowski, Kajetan [3 ]
Scherer, Rafal [4 ]
机构
[1] Gdansk Univ Technol, Fac Elect Telecommun & Informat, Gdansk, Poland
[2] Univ Warmia & Mazury, Olsztyn, Poland
[3] Lex Secure 24H Opieka Prawna, Sopot, Poland
[4] Czestochowa Tech Univ, Dept Intelligent Comp Syst, Czestochowa, Poland
来源
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2022, PT I | 2023年 / 13588卷
关键词
Text representation; Short text classification; Transformer; BERT;
D O I
10.1007/978-3-031-23492-7_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work presents the results of comparison text representations used for short text classification with SVM and neural network when challenged with imbalanced data. We analyze both direct and indirect methods for selecting the proper category and improve them with various representation techniques. As a baseline, we set up a BOW method and then use more sophisticated approaches: word embeddings and transformer-based. The study were done on a dataset from a legal domain where the task was to select the topic of the discussion with the layer. The experiments indicate that fine-tuned pre-trained BERT model for this task gives the best results.
引用
收藏
页码:105 / 114
页数:10
相关论文
共 50 条
  • [21] Cross-Domain Topic Classification for Political Texts
    Osnabruegge, Moritz
    Ash, Elliott
    Morelli, Massimo
    POLITICAL ANALYSIS, 2023, 31 (01) : 59 - 80
  • [22] Sentiment Classification of Short Texts based on Semantic Clustering
    He, Yunchao
    Yang, Chin-Sheng
    Yu, Liang-Chih
    Lai, K. Robert
    Liu, Weiyi
    PROCEEDINGS OF 2015 INTERNATIONAL CONFERENCE ON ORANGE TECHNOLOGIES (ICOT), 2015, : 54 - 57
  • [23] Short Texts Classification Through Reference Document Expansion
    YANG Zhen
    FAN Kefeng
    LAI Yingxu
    GAO Kaiming
    WANG Yong
    ChineseJournalofElectronics, 2014, 23 (02) : 315 - 321
  • [24] Short Texts Classification Through Reference Document Expansion
    Yang Zhen
    Fan Kefeng
    Lai Yingxu
    Gao Kaiming
    Wang Yong
    CHINESE JOURNAL OF ELECTRONICS, 2014, 23 (02) : 315 - 321
  • [25] Short texts classification through reference document expansion
    1600, Chinese Institute of Electronics (23):
  • [26] Multi-value Classification of Very Short Texts
    Hess, Andreas
    Dopichaj, Philipp
    Maass, Christian
    KI 2008: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2008, 5243 : 70 - 77
  • [27] A Novel Classification Method for Short Texts with Few Words
    Zhou, Ming
    Hu, Xuegang
    Zhu, Yi
    Li, Peipei
    PROCEEDINGS OF 2019 IEEE 3RD INFORMATION TECHNOLOGY, NETWORKING, ELECTRONIC AND AUTOMATION CONTROL CONFERENCE (ITNEC 2019), 2019, : 861 - 865
  • [28] Entity Linking Method for Chinese Short Texts with Multiple Embedded Representations
    Shi, Yongqi
    Yang, Ruopeng
    Yin, Changsheng
    Lu, Yiwei
    Yang, Yuantao
    Tao, Yu
    ELECTRONICS, 2023, 12 (12)
  • [29] Mining and Indexing of Legal Natural Language Texts with Domain and Task Ontology
    Kurcheeva, Galina
    Rakhvalova, Marina
    Rakhvalova, Daria
    Bakaev, Maxim
    ELECTRONIC GOVERNANCE AND OPEN SOCIETY: CHALLENGES IN EURASIA, EGOSE 2018, 2019, 947 : 123 - 137
  • [30] Legal texts
    Gewirtz, P
    REMAPPING THE BOUNDARIES: A NEW PERSPECTIVE IN COMPARATIVE STUDIES, 1997, 16 (01): : 137 - 145