Cross-Domain Authorship Attribution Using Pre-trained Language Models

被引:22
|
作者
Barlas, Georgios [1 ]
Stamatatos, Efstathios [1 ]
机构
[1] Univ Aegean, Karlovassi 83200, Greece
关键词
Authorship Attribution; Neural network language models; Pre-trained language models;
D O I
10.1007/978-3-030-49161-1_22
中图分类号
学科分类号
摘要
Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.
引用
收藏
页码:255 / 266
页数:12
相关论文
共 50 条
  • [31] Labeling Explicit Discourse Relations Using Pre-trained Language Models
    Kurfali, Murathan
    TEXT, SPEECH, AND DIALOGUE (TSD 2020), 2020, 12284 : 79 - 86
  • [32] Enhancing Turkish Sentiment Analysis Using Pre-Trained Language Models
    Koksal, Omer
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [33] Automated LOINC Standardization Using Pre-trained Large Language Models
    Tu, Tao
    Loreaux, Eric
    Chesley, Emma
    Lelkes, Adam D.
    Gamble, Paul
    Bellaiche, Mathias
    Seneviratne, Martin
    Chen, Ming-Jun
    MACHINE LEARNING FOR HEALTH, VOL 193, 2022, 193 : 343 - 355
  • [34] Controlling Translation Formality Using Pre-trained Multilingual Language Models
    Rippeth, Elijah
    Agrawal, Sweta
    Carpuat, Marine
    PROCEEDINGS OF THE 19TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE TRANSLATION (IWSLT 2022), 2022, : 327 - 340
  • [35] Repairing Security Vulnerabilities Using Pre-trained Programming Language Models
    Huang, Kai
    Yang, Su
    Sun, Hongyu
    Sun, Chengyi
    Li, Xuejun
    Zhang, Yuqing
    52ND ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOP VOLUME (DSN-W 2022), 2022, : 111 - 116
  • [36] A Study of Pre-trained Language Models in Natural Language Processing
    Duan, Jiajia
    Zhao, Hui
    Zhou, Qian
    Qiu, Meikang
    Liu, Meiqin
    2020 IEEE INTERNATIONAL CONFERENCE ON SMART CLOUD (SMARTCLOUD 2020), 2020, : 116 - 121
  • [37] From Cloze to Comprehension: Retrofitting Pre-trained Masked Language Models to Pre-trained Machine Reader
    Xu, Weiwen
    Li, Xin
    Zhang, Wenxuan
    Zhou, Meng
    Lam, Wai
    Si, Luo
    Bing, Lidong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [38] Pre-trained models for natural language processing: A survey
    Qiu XiPeng
    Sun TianXiang
    Xu YiGe
    Shao YunFan
    Dai Ning
    Huang XuanJing
    SCIENCE CHINA-TECHNOLOGICAL SCIENCES, 2020, 63 (10) : 1872 - 1897
  • [39] Probing Pre-Trained Language Models for Disease Knowledge
    Alghanmi, Israa
    Espinosa-Anke, Luis
    Schockaert, Steven
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3023 - 3033
  • [40] Analyzing Individual Neurons in Pre-trained Language Models
    Durrani, Nadir
    Sajjad, Hassan
    Dalvi, Fahim
    Belinkov, Yonatan
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 4865 - 4880