On the role of syntactic dependencies and discourse relations for author and gender identification

被引:10
|
作者
Soler-Company, Juan [1 ]
Wanner, Leo [1 ,2 ]
机构
[1] Pompeu Fabra Univ, Carrer de Roc Boronat 138, Barcelona 08018, Spain
[2] ICREA, Carrer de Roc Boronat 138, Barcelona 08018, Spain
关键词
Author profiling; Author identification; Gender identification; Text classification; ATTRIBUTION;
D O I
10.1016/j.patrec.2017.12.006
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Author and author gender identification are two major tasks in the context of profiling of authors of written material. Author identification (or, more precisely, "authorship attribution") copes with the assignment of the author, who is to be chosen from a given list of author names, to a piece of written material. Gender identification deals with the prediction of the gender of the author (male vs. female). Both tasks are very relevant to a number of applications, including, e.g., plagiarism and deception detection, document authenticity verification, and blackmailing. State of the art in both fields tends to rely mainly upon lexical and token (sequence) distribution features. But this means to neglect numerous linguistic studies that clearly indicate the high relevance of "deep linguistic", i.e., syntactic and discourse, features to the characterization of the style of an author or a group of authors. Our work on author and gender identification confirms this relevance. We show with two different genres, namely blog posts and literary writings, that the use of deep linguistic features is very effective. It leads to >78% (in the case of blog posts) and >91% (in the case of literary writings) of accuracy in author identification and >89% (blog posts) and >90% (literary writings) of accuracy in gender identification. (c) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:87 / 95
页数:9
相关论文
共 50 条
  • [21] Author gender identification from Arabic text
    Alsmearat, Kholoud
    Al-Ayyoub, Mahmoud
    Al-Shalabi, Riyad
    Kanaan, Ghassan
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2017, 35 : 85 - 95
  • [22] Boosting gender identification using author preference
    Kucukyilmaz, Tayfun
    Deniz, Ayca
    Kiziloz, Hakan Ezgi
    PATTERN RECOGNITION LETTERS, 2020, 140 : 245 - 251
  • [23] The identification of implicit discourse relations: the case of adversation
    Corminboeuf, Gilles
    4E CONGRES MONDIAL DE LINGUISTIQUE FRANCAISE, 2014, 8
  • [24] Automatic Identification of Discourse Relations in Indian Languages
    Devi, Sobha Lalitha
    Gopalan, Sindhuja
    Lakshmi, S.
    LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2014,
  • [25] The role of gender in Chilean argumentative discourse
    Forbes, K
    Cordella, M
    IRAL-INTERNATIONAL REVIEW OF APPLIED LINGUISTICS IN LANGUAGE TEACHING, 1999, 37 (04): : 277 - 289
  • [26] A Hybrid Method of Sentiment Key Sentence Identification Using Lexical Semantics and Syntactic Dependencies
    Feng, Chong
    Liao, Chun
    Liu, Zhirun
    Huang, Heyan
    WEB TECHNOLOGIES AND APPLICATIONS, APWEB 2014, PT II, 2014, 8710 : 11 - 22
  • [27] Signaling coherence relations by means of discourse markers makes identification of the relations easier? An investigation of the recognition of the relations by discourse addressees
    Antonio, Juliano Desiderato
    REVISTA DE ESTUDOS DA LINGUAGEM, 2016, 24 (01) : 293 - 325
  • [28] Empowering and shaping gender relations? Contesting the microfinance-gender empowerment discourse
    Yeboah, Thomas
    Arhin, Albert
    Kumi, Emmanuel
    Owusu, Lucy
    DEVELOPMENT IN PRACTICE, 2015, 25 (06) : 895 - 908
  • [29] Does the brain see language the way linguists do? Processing parallels across syntactic and discourse referential dependencies
    Kluender, Robert
    PSYCHOPHYSIOLOGY, 2007, 44 : S5 - S6
  • [30] The Identification of "Author" and "Addressee" in the Discourse of the Representative of Volga Germans of Siberia
    Kostomarov, Peter
    Ptashkin, Alexander
    XVTH INTERNATIONAL CONFERENCE LINGUISTIC AND CULTURAL STUDIES: TRADITIONS AND INNOVATIONS (LKTI 2015), 2015, 206 : 103 - 107