Gender identification on Twitter

被引:14
|
作者
Ikae, Catherine [1 ]
Savoy, Jacques [1 ]
机构
[1] Univ Neuchatel, Comp Sci Dept, Neuchatel, Switzerland
关键词
STYLE;
D O I
10.1002/asi.24541
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To determine the author of a text's gender, various feature types have been suggested (e.g., function words, n-gram of letters, etc.) leading to a huge number of stylistic markers. To determine the target category, different machine learning models have been suggested (e.g., logistic regression, decision tree, k nearest-neighbors, support vector machine, naive Bayes, neural networks, and random forest). In this study, our first objective is to know whether or not the same model always proposes the best effectiveness when considering similar corpora under the same conditions. Thus, based on 7 CLEF-PAN collections, this study analyzes the effectiveness of 10 different classifiers. Our second aim is to propose a 2-stage feature selection to reduce the feature size to a few hundred terms without any significant change in the performance level compared to approaches using all the attributes (increase of around 5% after applying the proposed feature selection). Based on our experiments, neural network or random forest tend, on average, to produce the highest effectiveness. Moreover, empirical evidence indicates that reducing the feature set size to around 300 without penalizing the effectiveness is possible. Finally, based on such reduced feature sizes, an analysis reveals some of the specific terms that clearly discriminate between the 2 genders.
引用
收藏
页码:58 / 69
页数:12
相关论文
共 50 条
  • [31] Gender inequality on Twitter during the UK election of 2019
    Fernandez Melendres, Carla
    Orrequia Barea, Aroa
    QUADERNS DE FILOLOGIA-ESTUDIS LINGUISTICS, 2021, 26 : 151 - 173
  • [32] Twitter-based gender recognition using transformers
    Nia, Zahra Movahedi
    Ahmadi, Ali
    Mellado, Bruce
    Wu, Jianhong
    Orbinski, James
    Asgary, Ali
    Kong, Jude D.
    MATHEMATICAL BIOSCIENCES AND ENGINEERING, 2023, 20 (09) : 15962 - 15981
  • [33] Gender Politics and Discourses of #mansplaining, #manspreading, and #manterruption on Twitter
    Lutzky, Ursula
    Lawson, Robert
    SOCIAL MEDIA + SOCIETY, 2019, 5 (03):
  • [34] Creating Extended Gender Labelled Datasets of Twitter Users
    Vicente, Marco
    Batista, Fernando
    Carvalho, Joao Paulo
    INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS, IPMU 2016, PT II, 2016, 611 : 690 - 702
  • [35] Gender and Generational Differences in Political Reporters' Interactivity on Twitter
    Parmelee, John H.
    Roman, Nataliya
    Beasley, Berrin
    Perkins, Stephynie C.
    JOURNALISM STUDIES, 2019, 20 (02) : 232 - 247
  • [36] Colombian politicians and Twitter usage: on the path to gender parity
    Gonzalez, Angie K. K.
    Ferre-Pavia, Carme
    POLITICAL RESEARCH EXCHANGE, 2023, 5 (01):
  • [37] TRANSPARENCY AND OTHER JOURNALISTIC NORMS ON TWITTER The role of gender
    Lasorsa, Dominic
    JOURNALISM STUDIES, 2012, 13 (03) : 402 - 417
  • [38] Teacher Twitter Chats: Gender Differences in Participants' Contributions
    Kerr, Stacey L.
    Schmeichel, Mardi J.
    JOURNAL OF RESEARCH ON TECHNOLOGY IN EDUCATION, 2018, 50 (03) : 241 - 252
  • [39] Empirical Evaluation of Profile Characteristics for Gender Classification on Twitter
    Alowibdi, Jalal S.
    Buy, Ugo A.
    Yu, Philip
    2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 1, 2013, : 365 - 369
  • [40] Talking Politics on Twitter: Gender, Elections, and Social Networks
    McGregor, Shannon C.
    Mourao, Rachel R.
    SOCIAL MEDIA + SOCIETY, 2016, 2 (03):