Short Texts Representations for Legal Domain Classification

被引:0
|
作者
Zymkowski, Tomasz [1 ]
Szymanski, Julian [1 ]
Sobecki, Andrzej [1 ]
Drozda, Pawel [2 ]
Szalapak, Konrad [3 ]
Komar-Komarowski, Kajetan [3 ]
Scherer, Rafal [4 ]
机构
[1] Gdansk Univ Technol, Fac Elect Telecommun & Informat, Gdansk, Poland
[2] Univ Warmia & Mazury, Olsztyn, Poland
[3] Lex Secure 24H Opieka Prawna, Sopot, Poland
[4] Czestochowa Tech Univ, Dept Intelligent Comp Syst, Czestochowa, Poland
关键词
Text representation; Short text classification; Transformer; BERT;
D O I
10.1007/978-3-031-23492-7_10
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work presents the results of comparison text representations used for short text classification with SVM and neural network when challenged with imbalanced data. We analyze both direct and indirect methods for selecting the proper category and improve them with various representation techniques. As a baseline, we set up a BOW method and then use more sophisticated approaches: word embeddings and transformer-based. The study were done on a dataset from a legal domain where the task was to select the topic of the discussion with the layer. The experiments indicate that fine-tuned pre-trained BERT model for this task gives the best results.
引用
收藏
页码:105 / 114
页数:10
相关论文
共 50 条
  • [1] Joint Representations of Texts and Labels with Compositional Loss for Short Text Classification
    Hao, Ming
    Wang, Weijing
    Zhou, Fang
    JOURNAL OF WEB ENGINEERING, 2021, 20 (03): : 669 - 687
  • [2] CLASSIFICATION OF LEGAL TEXTS BY COMPUTER
    BOREHAM, J
    NIBLETT, B
    INFORMATION PROCESSING & MANAGEMENT, 1976, 12 (02) : 125 - 132
  • [3] Classification of Short Scientific Texts
    Kusakin, I. K.
    Fedorets, O. V.
    Romanov, A. Y.
    SCIENTIFIC AND TECHNICAL INFORMATION PROCESSING, 2023, 50 (03) : 176 - 183
  • [4] Classification of Short Scientific Texts
    I. K. Kusakin
    O. V. Fedorets
    A. Y. Romanov
    Scientific and Technical Information Processing, 2023, 50 : 176 - 183
  • [5] Grounding Visual Representations with Texts for Domain Generalization
    Min, Seonwoo
    Park, Nokyung
    Kim, Siwon
    Park, Seunghyun
    Kim, Jinkyu
    COMPUTER VISION, ECCV 2022, PT XXXVII, 2022, 13697 : 37 - 53
  • [6] A Classification Retrieval Approach for English Legal Texts
    Li, Zhonghao
    2019 INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION, BIG DATA & SMART CITY (ICITBS), 2019, : 220 - 223
  • [7] Visual representations with texts domain generalization for semantic segmentation
    Yue, Wanlin
    Zhou, Zhiheng
    Cao, Yinglie
    Wu, Weikang
    APPLIED INTELLIGENCE, 2023, 53 (24) : 30069 - 30079
  • [8] Visual representations with texts domain generalization for semantic segmentation
    Wanlin Yue
    Zhiheng Zhou
    Yinglie Cao
    Weikang Wu
    Applied Intelligence, 2023, 53 : 30069 - 30079
  • [9] Contextual Domain Classification with Temporal Representations
    Lin, Tzu-Hsiang
    Shi, Yipeng
    Ye, Chentao
    Yang, Fan
    Ruan, Weitong
    Barut, Emre
    Hamza, Wael
    Su, Chengwei
    2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, NAACL-HLT 2021, 2021, : 41 - 48
  • [10] Named Entities Recognition and Classification in Spanish Legal Texts
    Samy, Doaa
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2021, (67): : 103 - 114