Measuring Gender: A Machine Learning Approach to Social Media Demographics and Author Profiling

被引:0
|
作者
Kovacs, Erik-Robert [1 ]
Cotfas, Liviu-Adrian [1 ]
Delcea, Camelia [1 ]
机构
[1] Bucharest Univ Econ Studies, Dept Econ Informat & Cybernet, Bucharest 010552, Romania
关键词
author profiling; gender identification; ensemble methods; social media analysis; COVID-19; SENTIMENT ANALYSIS; TWITTER; NETWORKS; TWEETS;
D O I
10.1007/978-3-031-41456-5_26
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Social media has become a preeminent medium of communication during the early 21(st) century, facilitating dialogue between the political sphere, businesses, scientific experts, and everyday people. Researchers in the social sciences are focusing their attention on social media as a central site of social discourse, but such approaches are hampered by the lack of demographic data that could help them connect phenomena originating in social media spaces to their larger social context. Computational social science methods which use machine learning and deep learning natural language processing (NLP) tools for the task of author profiling (AP) can serve as an essential complement to such research. One of the major demographic categories of interest concerning social media is the gender distribution of users. We propose an ensemble of multiple machine learning classifiers able to distinguish whether a user is anonymous with an F1 score of 90.24%, then predict the gender of the user based on their name, obtaining an F1 score of 89.22%. We apply the classification pipeline to a set of approximately 44,000,000 posts related to COVID-19 extracted from the social media platform Twitter, comparing our results to a benchmark classifier trained on the PAN18 Author Profiling dataset, showing the validity of the proposed approach. An n-gram analysis on the text of the tweets to further compare the two methods has been performed.
引用
收藏
页码:337 / 349
页数:13
相关论文
共 50 条
  • [21] A comparative analysis of distributional term representations for author profiling in social media
    Alvarez-Carmona, Miguel A.
    Villatoro-Tello, Esau
    Montes-Y-Gomez, Manuel
    Villasenor-Pineda, Luis
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (05) : 4857 - 4868
  • [22] Evaluating Topic-Based Representations for Author Profiling in Social Media
    Alvarez-Carmona, Miguel A.
    Pastor Lopez-Monroy, A.
    Montes-y-Gomez, Manuel
    Villasenor-Pineda, Luis
    Meza, Ivan
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2016, 2016, 10022 : 151 - 162
  • [23] Schizophrenia Detection Using Machine Learning Approach from Social Media Content
    Bae, Yi Ji
    Shim, Midan
    Lee, Won Hee
    SENSORS, 2021, 21 (17)
  • [24] Thai Sentiment Analysis for Social Media Monitoring using Machine Learning Approach
    Srikamdee, Supawadee
    Suksawatchon, Ureerat
    Suksawatchon, Jakkarin
    2022 37TH INTERNATIONAL TECHNICAL CONFERENCE ON CIRCUITS/SYSTEMS, COMPUTERS AND COMMUNICATIONS (ITC-CSCC 2022), 2022, : 832 - 835
  • [25] An Investigation of Suicidal Ideation from Social Media Using Machine Learning Approach
    Saha, Soumyabrata
    Dasgupta, Suparna
    Anam, Adnan
    Saha, Rahul
    Nath, Sudarshan
    Dutta, Surajit
    BAGHDAD SCIENCE JOURNAL, 2023, 20 (03) : 1164 - 1181
  • [26] Tracking Coronavirus Pandemic Diseases using Social Media: A Machine Learning Approach
    Fakhry, Nuha Noha
    Asfoura, Evan
    Kassam, Gamal
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (10) : 211 - 219
  • [27] Machine Learning Based Social Media Recommendation
    Lai, Taiping
    Zheng, Xianghan
    PROCEEDINGS 2015 SECOND IEEE INTERNATIONAL CONFERENCE ON SPATIAL DATA MINING AND GEOGRAPHICAL KNOWLEDGE SERVICES (ICSDM 2015), 2015, : 28 - 32
  • [28] Classifying Social Media Users with Machine Learning
    Li G.
    Zhou H.
    Mao J.
    Chen S.
    Data Analysis and Knowledge Discovery, 2019, 3 (08) : 1 - 9
  • [29] Machine learning approach for threat detection on social media posts containing Arabic text
    AlAjlan, Shatha AbdulAziz
    Saudagar, Abdul Khader Jilani
    EVOLUTIONARY INTELLIGENCE, 2021, 14 (02) : 811 - 822
  • [30] User Stress Detection Using Social Media Text: A Novel Machine Learning Approach
    Wan, X. X.
    Tian, L.
    INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2024, 19 (05)