Cross-Domain Authorship Attribution Using Pre-trained Language Models

被引：22

作者：

Barlas, Georgios ^{[1
]}

Stamatatos, Efstathios ^{[1
]}

机构：

[1] Univ Aegean, Karlovassi 83200, Greece

来源：

ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2020, PT I | 2020年 / 583卷

关键词：

Authorship Attribution; Neural network language models; Pre-trained language models;

D O I：

10.1007/978-3-030-49161-1_22

中图分类号：

学科分类号：

摘要：

Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.

引用

页码：255 / 266

页数：12

共 50 条

[41] Dynamic Knowledge Distillation for Pre-trained Language Models
Li, Lei
Lin, Yankai
Ren, Shuhuai
Li, Peng
Zhou, Jie
Sun, Xu
2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), 2021, : 379 - 389
[42] Prompt Tuning for Discriminative Pre-trained Language Models
Yao, Yuan
Dong, Bowen
Zhang, Ao
Zhang, Zhengyan
Xie, Ruobing
Liu, Zhiyuan
Lin, Leyu
Sun, Maosong
Wang, Jianyong
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3468 - 3473
[43] Impact of Morphological Segmentation on Pre-trained Language Models
Westhelle, Matheus
Bencke, Luciana
Moreira, Viviane P.
INTELLIGENT SYSTEMS, PT II, 2022, 13654 : 402 - 416
[44] A Close Look into the Calibration of Pre-trained Language Models
Chen, Yangyi
Yuan, Lifan
Cui, Ganqu
Liu, Zhiyuan
Ji, Heng
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 1343 - 1367
[45] Deep Entity Matching with Pre-Trained Language Models
Li, Yuliang
Li, Jinfeng
Suhara, Yoshihiko
Doan, AnHai
Tan, Wang-Chiew
PROCEEDINGS OF THE VLDB ENDOWMENT, 2020, 14 (01): : 50 - 60
[46] A Survey of Knowledge Enhanced Pre-Trained Language Models
Hu, Linmei
Liu, Zeyi
Zhao, Ziwang
Hou, Lei
Nie, Liqiang
Li, Juanzi
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (04) : 1413 - 1430
[47] Exploring Robust Overfitting for Pre-trained Language Models
Zhu, Bin
Rao, Yanghui
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5506 - 5522
[48] Commonsense Knowledge Transfer for Pre-trained Language Models
Zhou, Wangchunshu
Le Bras, Ronan
Choi, Yejin
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, 2023, : 5946 - 5960
[49] Self-conditioning Pre-Trained Language Models
Suau, Xavier
Zappella, Luca
Apostoloff, Nicholas
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[50] InA: Inhibition Adaption on pre-trained language models
Kang, Cheng
Prokop, Jindrich
Tong, Lei
Zhou, Huiyu
Hu, Yong
Novak, Daniel
NEURAL NETWORKS, 2024, 178

← 1 2 3 4 5 →