Unifying Lexical, Syntactic, and Structural Representations of Written Language for Authorship Attribution

被引:0
|
作者
Jafariakinabad F. [1 ]
Hua K.A. [1 ]
机构
[1] University of Central Florida, Orlando, FL
关键词
Authorship attribution; Deep neural networks; Document analysis; Natural language processing; Syntax encoding;
D O I
10.1007/s42979-021-00911-2
中图分类号
学科分类号
摘要
Writing style in written language is a combination of consistent decisions associated with a specific author at different levels of language production, including lexical, syntactic, and structural. The recent work in neural network based style analysis mainly lacks the multi-level modeling of writing style. In this paper, we introduce a style-aware neural model to encode document information from three stylistic levels and evaluate it in the domain of authorship attribution. First, we propose a simple way to jointly encode syntactic and lexical representations of sentences. Subsequently, we employ an attention-based hierarchical neural network to encode the syntactic and semantic structure of sentences in documents while rewarding the sentences which contribute more in capturing the writing style. Our experimental results, based on four benchmark datasets, reveal the benefits of encoding document information from all three stylistic levels when compared to the baseline methods in the literature. Additionally, We adopt a transfer learning approach and use deep contextualized word representation (ELMo) in our model to measure the impact of lower level linguistic representations versus higher level linguistic representations of ELMo in the task of authorship attribution. According to our experimental results, lower level linguistic representations which mainly carry syntactic information demonstrate better performance in authorship attribution task when compared to higher level linguistic representations which mainly carry semantic information. © 2021, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 50 条
  • [31] Authorship Attribution Using a Neural Network Language Model
    Ge, Zhenhao
    Sun, Yufang
    Smith, Mark J. T.
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 4212 - 4213
  • [32] LINGUISTIC PROFILE OF WRITTEN MEDIA - ANALYSIS OF LEXICAL, SYNTACTIC AND NORMATIVE INDICES
    TREMBLAY, L
    FRANCAIS MODERNE, 1996, 64 (02): : 169 - 192
  • [33] Testing the abstractness of children's linguistic representations: lexical and structural priming of syntactic constructions in young children
    Savage, C
    Lieven, E
    Theakston, A
    Tomasello, M
    DEVELOPMENTAL SCIENCE, 2003, 6 (05) : 557 - 567
  • [34] OBJECTS AND PLACES - GEOMETRIC AND SYNTACTIC REPRESENTATIONS IN EARLY LEXICAL LEARNING
    LANDAU, B
    STECKER, DS
    COGNITIVE DEVELOPMENT, 1990, 5 (03) : 287 - 312
  • [35] Lexical Stress-Based Authorship Attribution with Accurate Pronunciation Patterns Selection
    Ivanov, Lubomir
    Aebig, Amanda
    Meerman, Stephen
    TEXT, SPEECH, AND DIALOGUE (TSD 2018), 2018, 11107 : 67 - 75
  • [36] Syntactic influences on lexical and morphological processing in language production
    Ferreira, VS
    Humphreys, KR
    JOURNAL OF MEMORY AND LANGUAGE, 2001, 44 (01) : 52 - 80
  • [37] Cortical correlates of lexical and syntactic sign language processing
    MacSweeney, M
    Woll, B
    Brammer, M
    Campbell, R
    Calvert, G
    David, A
    Williams, SC
    McGuire, PK
    NEUROIMAGE, 2001, 13 (06) : S563 - S563
  • [38] Phonological, lexical, and syntactic components of language development - Preface
    Frauenfelder, Ulrich
    Rizzi, Luigi
    Zesiger, Pascal
    LANGUAGE AND SPEECH, 2008, 51 : 1 - 2
  • [39] Lexical activation of cross-language syntactic priming
    Salamoura, Angeliki
    Williams, John N.
    BILINGUALISM-LANGUAGE AND COGNITION, 2006, 9 (03) : 299 - 307
  • [40] A Scalable Distributed Syntactic, Semantic, and Lexical Language Model
    Tan, Ming
    Zhou, Wenli
    Zheng, Lei
    Wang, Shaojun
    COMPUTATIONAL LINGUISTICS, 2012, 38 (03) : 631 - 671