Gender identification on Twitter

被引:14
|
作者
Ikae, Catherine [1 ]
Savoy, Jacques [1 ]
机构
[1] Univ Neuchatel, Comp Sci Dept, Neuchatel, Switzerland
关键词
STYLE;
D O I
10.1002/asi.24541
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
To determine the author of a text's gender, various feature types have been suggested (e.g., function words, n-gram of letters, etc.) leading to a huge number of stylistic markers. To determine the target category, different machine learning models have been suggested (e.g., logistic regression, decision tree, k nearest-neighbors, support vector machine, naive Bayes, neural networks, and random forest). In this study, our first objective is to know whether or not the same model always proposes the best effectiveness when considering similar corpora under the same conditions. Thus, based on 7 CLEF-PAN collections, this study analyzes the effectiveness of 10 different classifiers. Our second aim is to propose a 2-stage feature selection to reduce the feature size to a few hundred terms without any significant change in the performance level compared to approaches using all the attributes (increase of around 5% after applying the proposed feature selection). Based on our experiments, neural network or random forest tend, on average, to produce the highest effectiveness. Moreover, empirical evidence indicates that reducing the feature set size to around 300 without penalizing the effectiveness is possible. Finally, based on such reduced feature sizes, an analysis reveals some of the specific terms that clearly discriminate between the 2 genders.
引用
收藏
页码:58 / 69
页数:12
相关论文
共 50 条
  • [1] Gender identification of egyptian dialect in twitter
    Husseina, Shereen
    Farouk, Mona
    Hemayed, ElSayed
    EGYPTIAN INFORMATICS JOURNAL, 2019, 20 (02) : 109 - 116
  • [2] A Turkish Dataset for Gender Identification of Twitter Users
    Sezerer, Erhan
    Polatbilek, Ozan
    Tekir, Selma
    13TH LINGUISTIC ANNOTATION WORKSHOP (LAW XIII), 2019, : 203 - 207
  • [3] A visual approach for age and gender identification on Twitter
    Alvarez-Carmona, Miguel A.
    Pellegrin, Luis
    Montes-y-Gomez, Manuel
    Sanchez-Vega, Fernando
    Jair Escalante, Hugo
    Lopez-Monroy, A. Pastor
    Villasenor-Pineda, Luis
    Villatoro-Tello, Esau
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2018, 34 (05) : 3133 - 3145
  • [4] Drake or Hen? Machine Learning for Gender Identification on Twitter
    Gombert, Arnault
    Cerquides, Jesus
    ARTIFICIAL INTELLIGENCE RESEARCH AND DEVELOPMENT, 2022, 356 : 59 - 66
  • [5] Gender Identification Using Marginalised Stacked Denoising Autoencoders on Twitter Data
    Al-onazi, Badriyya B.
    Nour, Mohamed K.
    Alshamrani, Hassan
    Al Duhayyim, Mesfer
    Mohsen, Heba
    Abdelmageed, Amgad Atta
    Mohammed, Gouse Pasha
    Zamani, Abu Sarwar
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 36 (03): : 2529 - 2544
  • [6] Gender identification for Egyptian Arabic dialect in twitter using deep learning models
    ElSayed, Shereen
    Farouk, Mona
    EGYPTIAN INFORMATICS JOURNAL, 2020, 21 (03) : 159 - 167
  • [7] News sourcing and gender on Twitter
    Artwick, Claudette G.
    JOURNALISM, 2014, 15 (08) : 1111 - 1127
  • [8] Gender Differences in Twitter Complaints
    Kadir, Zaemah Abdul
    Ali, Norsyafigah Mohd
    Husain, Sharifah Shahnaz Syed
    Zubir, Zurina
    ENVIRONMENT-BEHAVIOUR PROCEEDINGS JOURNAL, 2022, 7 : 67 - 74
  • [9] Identification of Extremism on Twitter
    Wei, Yifang
    Singh, Lisa
    Martin, Susan
    PROCEEDINGS OF THE 2016 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING ASONAM 2016, 2016, : 1251 - 1255
  • [10] Identification of Rumors on Twitter
    Patil, Richa Anant
    Gawande, Kiran
    Dhage, Sudhir N.
    RECENT TRENDS IN COMMUNICATION AND INTELLIGENT SYSTEMS, ICRTCIS 2019, 2020, : 219 - 226