Measuring Gender: A Machine Learning Approach to Social Media Demographics and Author Profiling

被引:0
|
作者
Kovacs, Erik-Robert [1 ]
Cotfas, Liviu-Adrian [1 ]
Delcea, Camelia [1 ]
机构
[1] Bucharest Univ Econ Studies, Dept Econ Informat & Cybernet, Bucharest 010552, Romania
关键词
author profiling; gender identification; ensemble methods; social media analysis; COVID-19; SENTIMENT ANALYSIS; TWITTER; NETWORKS; TWEETS;
D O I
10.1007/978-3-031-41456-5_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Social media has become a preeminent medium of communication during the early 21(st) century, facilitating dialogue between the political sphere, businesses, scientific experts, and everyday people. Researchers in the social sciences are focusing their attention on social media as a central site of social discourse, but such approaches are hampered by the lack of demographic data that could help them connect phenomena originating in social media spaces to their larger social context. Computational social science methods which use machine learning and deep learning natural language processing (NLP) tools for the task of author profiling (AP) can serve as an essential complement to such research. One of the major demographic categories of interest concerning social media is the gender distribution of users. We propose an ensemble of multiple machine learning classifiers able to distinguish whether a user is anonymous with an F1 score of 90.24%, then predict the gender of the user based on their name, obtaining an F1 score of 89.22%. We apply the classification pipeline to a set of approximately 44,000,000 posts related to COVID-19 extracted from the social media platform Twitter, comparing our results to a benchmark classifier trained on the PAN18 Author Profiling dataset, showing the validity of the proposed approach. An n-gram analysis on the text of the tweets to further compare the two methods has been performed.
引用
收藏
页码:337 / 349
页数:13
相关论文
共 50 条
  • [1] Demographics and Personality Discovery on Social Media: A Machine Learning Approach
    Tuomchomtam, Sarach
    Soonthornphisaj, Nuanwan
    INFORMATION, 2021, 12 (09)
  • [2] Measuring the Gender Discrimination: a Machine Learning Approach
    Alatrista-Salas, Hugo
    Esposito, Bruno
    Nunez-del-Prado, Miguel
    Valdivieso, Maria
    2017 IEEE LATIN AMERICAN CONFERENCE ON COMPUTATIONAL INTELLIGENCE (LA-CCI), 2017,
  • [3] Gender Classification Models and Feature Impact for Social Media Author Profiling
    Piot-Perez-Abadin, Paloma
    Martin-Rodilla, Patricia
    Parapar, Javier
    EVALUATION OF NOVEL APPROACHES TO SOFTWARE ENGINEERING (ENASE 2021), 2022, 1556 : 265 - 287
  • [4] Deep Learning Techniques for Author Profiling in Social Media Content
    Bsir, Bassem
    Zrigui, Mounir
    INNOVATION MANAGEMENT AND EDUCATION EXCELLENCE THROUGH VISION 2020, VOLS I -XI, 2018, : 4590 - 4599
  • [5] Detecting and Measuring Depression on Social Media Using a Machine Learning Approach: Systematic Review
    Liu, Danxia
    Feng, Xing Lin
    Ahmed, Farooq
    Shahid, Muhammad
    Guo, Jing
    JMIR MENTAL HEALTH, 2022, 9 (03):
  • [6] Author Profiling in Social Media with Multimodal Information
    Alvarez Carmona, Miguel A.
    Villatoro Tello, Esau
    Montes y Gomez, Manuel
    Villasenor Pineda, Luis
    COMPUTACION Y SISTEMAS, 2020, 24 (03): : 1289 - 1304
  • [7] Experimental Analysis of the Relevance of Features and Effects on Gender Classification Models for Social Media Author Profiling
    Piot-Perez-Abadin, Paloma
    Martin-Rodilla, Patricia
    Parapar, Javier
    ENASE: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON EVALUATION OF NOVEL APPROACHES TO SOFTWARE ENGINEERING, 2021, : 103 - 113
  • [8] Data Profiling and Machine Learning to Identify Influencers from Social Media Platforms
    Elbaghazaoui B.E.
    Amnai M.
    Fakhri Y.
    Journal of ICT Standardization, 2022, 10 (02): : 201 - 218
  • [9] Social Media Bot Detection Using Machine Learning Approach
    Bhongale, Prathamesh
    Sali, Om
    Mehetre, Shraddha
    ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2022, PT II, 2023, 1798 : 205 - 216
  • [10] A scalable machine learning approach for measuring violent and peaceful forms of political protest participation with social media data
    Anastasopoulos, Lefteris Jason
    Williams, Jake Ryland
    PLOS ONE, 2019, 14 (03):