Evaluating the Representativeness in the Geographic Distribution of Twitter User Population

被引:7
|
作者
Yin, Junjun [1 ]
Chi, Guangqing [2 ]
Van Hook, Jennifer [3 ]
机构
[1] Penn State Univ, Social Sci Res Inst, State Coll, PA 16801 USA
[2] Penn State Univ, Dept Agr Econ Sociol & Educ, State Coll, PA USA
[3] Penn State Univ, Dept Sociol & Criminol, State Coll, PA USA
关键词
Geo-tagged Tweets; Demographics; Bias; Representativeness; Geographic Distribution;
D O I
10.1145/3281354.3281360
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Twitter data are becoming a Big Data stream and have drawn multidisciplinary interests to study population characteristics and social problems that cannot be measured well by traditional surveys. However, the use of Twitter data has been strongly resisted because of concerns about the representativeness of the population as we know little about the demographic characters of the users. It is critical to evaluate the extent to which Twitter users represent the population across different demographic groups. This study evaluates the representativeness and examines the geographic distributions of Twitter user population and its correspondence to the real population. By estimating Twitter user demographics for the contiguous U.S. in 2014, the preliminary results revealed both over- and under-representation of certain demographic groups against the real population at county-level. A representation index is used to assess the representativeness of Twitter samples geographically, which may help further studies to identify the determinants of biases.
引用
收藏
页数:2
相关论文
共 50 条
  • [1] Understanding the Political Representativeness of Twitter Users
    Barbera, Pablo
    Rivero, Gonzalo
    SOCIAL SCIENCE COMPUTER REVIEW, 2015, 33 (06) : 712 - 729
  • [2] Geographic Distribution of the Population of Galicia
    不详
    GEOGRAPHICAL TEACHER, 1926, 13 (04): : 342 - 342
  • [3] Homing in on Twitter Users: Evaluating an Enhanced Geoparser for User Profile Locations
    Alex, Beatrice
    Llewellyn, Clare
    Grover, Claire
    Oberlander, Jon
    Tobin, Richard
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 3936 - 3944
  • [4] Evaluating large language models for user stance detection on X (Twitter)
    Gambini, Margherita
    Senette, Caterina
    Fagni, Tiziano
    Tesconi, Maurizio
    MACHINE LEARNING, 2024, 113 (10) : 7243 - 7266
  • [5] Evaluating geographic visualization tools and learning about user tasks
    Tobón, C
    PROCEEDINGS OF THE TWENTY-FIFTH ANNUAL CONFERENCE OF THE COGNITIVE SCIENCE SOCIETY, PTS 1 AND 2, 2003, : 1407 - 1407
  • [6] REPRESENTATION AND MEASUREMENT OF POPULATION DISTRIBUTION IN GEOGRAPHIC SPACE
    WINKLER, W
    METRIKA, 1969, 14 (2-3) : 138 - 163
  • [7] National income taxation and the geographic distribution of population
    Jørn Rattsø
    Hildegunn E. Stokke
    International Tax and Public Finance, 2017, 24 : 879 - 902
  • [8] National income taxation and the geographic distribution of population
    Rattso, Jorn
    Stokke, Hildegunn E.
    INTERNATIONAL TAX AND PUBLIC FINANCE, 2017, 24 (05) : 879 - 902
  • [9] A geographic study of the growth and distribution of population in Michigan
    Genthe, M. K.
    PETERMANNS MITTEILUNGEN, 1916, 62 : 466 - 466
  • [10] POPULATION OF THE NETHERLANDS - GROWTH AND GEOGRAPHIC-DISTRIBUTION
    不详
    POPULATION, 1963, 18 (02): : 362 - 363