Cross-Domain Authorship Attribution Using Pre-trained Language Models

被引:22
|
作者
Barlas, Georgios [1 ]
Stamatatos, Efstathios [1 ]
机构
[1] Univ Aegean, Karlovassi 83200, Greece
关键词
Authorship Attribution; Neural network language models; Pre-trained language models;
D O I
10.1007/978-3-030-49161-1_22
中图分类号
学科分类号
摘要
Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.
引用
收藏
页码:255 / 266
页数:12
相关论文
共 50 条
  • [21] Evaluating Commonsense in Pre-Trained Language Models
    Zhou, Xuhui
    Zhang, Yue
    Cui, Leyang
    Huang, Dandan
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9733 - 9740
  • [22] Knowledge Inheritance for Pre-trained Language Models
    Qin, Yujia
    Lin, Yankai
    Yi, Jing
    Zhang, Jiajie
    Han, Xu
    Zhang, Zhengyan
    Su, Yusheng
    Liu, Zhiyuan
    Li, Peng
    Sun, Maosong
    Zhou, Jie
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 3921 - 3937
  • [23] Code Execution with Pre-trained Language Models
    Liu, Chenxiao
    Lu, Shuai
    Chen, Weizhu
    Jiang, Daxin
    Svyatkovskiy, Alexey
    Fu, Shengyu
    Sundaresan, Neel
    Duan, Nan
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 4984 - 4999
  • [24] Probing for Hyperbole in Pre-Trained Language Models
    Schneidermann, Nina Skovgaard
    Hershcovich, Daniel
    Pedersen, Bolette Sandford
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-SRW 2023, VOL 4, 2023, : 200 - 211
  • [25] Pre-trained language models in medicine: A survey *
    Luo, Xudong
    Deng, Zhiqi
    Yang, Binxia
    Luo, Michael Y.
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2024, 154
  • [26] Pre-trained Language Models for the Legal Domain: A Case Study on Indian Law
    Paul, Shounak
    Mandal, Arpan
    Goyal, Pawan
    Ghosh, Saptarshi
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND LAW, ICAIL 2023, 2023, : 187 - 196
  • [27] Zero-shot domain paraphrase with unaligned pre-trained language models
    Zheng Chen
    Hu Yuan
    Jiankun Ren
    Complex & Intelligent Systems, 2023, 9 : 1097 - 1110
  • [28] Zero-shot domain paraphrase with unaligned pre-trained language models
    Chen, Zheng
    Yuan, Hu
    Ren, Jiankun
    COMPLEX & INTELLIGENT SYSTEMS, 2023, 9 (01) : 1097 - 1110
  • [29] Enhancing Domain Modeling with Pre-trained Large Language Models: An Automated Assistant for Domain Modelers
    Prokop, Dominik
    Stenchlak, Stepan
    Skoda, Petr
    Klimek, Jakub
    Necasky, Martin
    CONCEPTUAL MODELING, ER 2024, 2025, 15238 : 235 - 253
  • [30] TPUF: Enhancing Cross-domain Sequential Recommendation via Transferring Pre-trained User Features
    Ding, Yujia
    Li, Huan
    Chen, Ke
    Shou, Lidan
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 410 - 419