Gender identification on Twitter

被引:14
|
作者
Ikae, Catherine [1 ]
Savoy, Jacques [1 ]
机构
[1] Univ Neuchatel, Comp Sci Dept, Neuchatel, Switzerland
关键词
STYLE;
D O I
10.1002/asi.24541
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To determine the author of a text's gender, various feature types have been suggested (e.g., function words, n-gram of letters, etc.) leading to a huge number of stylistic markers. To determine the target category, different machine learning models have been suggested (e.g., logistic regression, decision tree, k nearest-neighbors, support vector machine, naive Bayes, neural networks, and random forest). In this study, our first objective is to know whether or not the same model always proposes the best effectiveness when considering similar corpora under the same conditions. Thus, based on 7 CLEF-PAN collections, this study analyzes the effectiveness of 10 different classifiers. Our second aim is to propose a 2-stage feature selection to reduce the feature size to a few hundred terms without any significant change in the performance level compared to approaches using all the attributes (increase of around 5% after applying the proposed feature selection). Based on our experiments, neural network or random forest tend, on average, to produce the highest effectiveness. Moreover, empirical evidence indicates that reducing the feature set size to around 300 without penalizing the effectiveness is possible. Finally, based on such reduced feature sizes, an analysis reveals some of the specific terms that clearly discriminate between the 2 genders.
引用
收藏
页码:58 / 69
页数:12
相关论文
共 50 条
  • [11] Gender Aspect of Political Communication on Twitter
    Minaeva, Iiudmila, V
    PROCEEDINGS OF THE 2021 COMMUNICATION STRATEGIES IN DIGITAL SOCIETY SEMINAR (2021 COMSDS), 2021, : 115 - 117
  • [12] Gender dynamics of German journalists on Twitter
    Witzenberger, Benedict
    Pfeffer, Jurgen
    Proceedings of the 2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2022, 2022, : 226 - 230
  • [13] Gender-inclusive Language in Twitter
    Samples, Caitlin E.
    HISPANIA-A JOURNAL DEVOTED TO THE TEACHING OF SPANISH AND PORTUGUESE, 2024, 107 (01): : 139 - 160
  • [14] Language Independent Gender Classification on Twitter
    Alowibdi, Jalal S.
    Buy, Ugo A.
    Yu, Philip
    2013 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), 2013, : 745 - 749
  • [15] A Systematic Identification of Scientists on Twitter
    Ke, Qing
    Ahn, Yong-Yeol
    Sugimoto, Cassidy R.
    21ST INTERNATIONAL CONFERENCE ON SCIENCE AND TECHNOLOGY INDICATORS (STI 2016), 2016, : 1160 - 1164
  • [16] Identification of Credulous Users on Twitter
    Balestrucci, Alessandro
    De Nicola, Rocco
    Inverso, Omar
    Trubiani, Catia
    SAC '19: PROCEEDINGS OF THE 34TH ACM/SIGAPP SYMPOSIUM ON APPLIED COMPUTING, 2019, : 2096 - 2103
  • [17] Competition Component Identification on Twitter
    Yang, Cheng-Huang
    Chen, Ji-De
    Kao, Hung-Yu
    TRENDS AND APPLICATIONS IN KNOWLEDGE DISCOVERY AND DATA MINING, 2014, 8643 : 584 - 595
  • [18] Gender differences in the climate change communication on Twitter
    Holmberg, Kim
    Hellsten, Iina
    INTERNET RESEARCH, 2015, 25 (05) : 811 - 828
  • [19] Features combination for gender recognition on Twitter users
    Fernandez, Daniela
    Moctezuma, Daniela
    Siordia, Oscar S.
    2016 IEEE INTERNATIONAL AUTUMN MEETING ON POWER, ELECTRONICS AND COMPUTING (ROPEC), 2016,
  • [20] Gender Differences in Sports News Coverage on Twitter
    Sainz-de-Baranda, Clara
    Ada-Lameiras, Alba
    Blanco-Ruiz, Marian
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2020, 17 (14) : 1 - 13