Citizens' data afterlives: Practices of dataset inclusion in machine learning for public welfare

被引:0
|
作者
Ratner, Helene Friis [1 ,2 ]
Thylstrup, Nanna Bonde [2 ]
机构
[1] Aarhus Univ, Danish Sch Educ DPU, Tuborgvej 164, DK-2400 Copenhagen N, Denmark
[2] Univ Copenhagen, Dept Arts & Cultural Studies, Karen Blixensvej 1, DK-2300 Copenhagen, Denmark
关键词
Machine learning; Welfare state; Data afterlives; Dataset negotiations; DATABASES; CHILD; CARE;
D O I
10.1007/s00146-024-01920-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Public sector adoption of AI techniques in welfare systems recasts historic national data as resource for machine learning. In this paper, we examine how the use of register data for development of predictive models produces new 'afterlives' for citizen data. First, we document a Danish research project's practical efforts to develop an algorithmic decision-support model for social workers to classify children's risk of maltreatment. Second, we outline the tensions emerging from project members' negotiations about which datasets to include. Third, we identify three types of afterlives for citizen data in machine learning projects: (1) data afterlives for training and testing the algorithm, acting as 'ground truth' for inferring futures, (2) data afterlives for validating the algorithmic model, acting as markers of robustness, and (3) data afterlives for improving the model's fairness, valuated for reasons of data ethics. We conclude by discussing how, on one hand, these afterlives engender new ethical relations between state and citizens; and how they, on the other hand, also articulate an alternative view on the value of datasets, posing interesting contrasts between machine learning projects developed within the context of the Danish welfare state and mainstream corporate AI discourses of the bigger, the better.
引用
收藏
页码:1183 / 1193
页数:11
相关论文
共 50 条
  • [41] Machine Learning for Real-Time Data-Driven Security Practices
    Coleman, Shane
    Doody, Pat
    Shields, Andrew
    2018 29TH IRISH SIGNALS AND SYSTEMS CONFERENCE (ISSC), 2018,
  • [42] Brazilian Data Scientists: Revealing their Challenges and Practices on Machine Learning Model Development
    Correia, Joao Lucas
    Pereira, Juliana Alves
    Mello, Rafael
    Garcia, Alessandro
    Fonseca, Baldoino
    Ribeiro, Marcio
    Gheyi, Rohit
    Kalinowski, Marcos
    Cerqueira, Renato
    Tiengo, Willy
    PROCEEDINGS OF THE 19TH BRAZILIAN SYMPOSIUM ON SOFTWARE QUALITY, SBOS 2020, 2020,
  • [43] Data Science and Machine Learning Teaching Practices with Focus on Vocational Education and Training
    Nadzinski, Gorjan
    Gerazov, Branislav
    Zlatinov, Stefan
    Kartalov, Tomislav
    Dimitrovska, Marija M. A. R. K. O. V. S. K. A.
    Gjoreski, Hristijan
    Chavdarov, Risto
    Kokolanski, Zivko
    Atanasov, Igor
    Horstmann, Jelena
    Sterle, Uros
    Gams, Matjaz
    INFORMATICS IN EDUCATION, 2023, 22 (04): : 671 - 690
  • [44] Preview of "Data and its (dis)contents: A survey of dataset development and use in machine learning research''
    Vincent, Nicholas
    Hecht, Brent
    PATTERNS, 2021, 2 (11):
  • [45] Training data selection based on dataset distillation for rapid deployment in machine-learning workflows
    Jeong, Yuna
    Hwang, Myunggwon
    Sung, Wonkyung
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (07) : 9855 - 9870
  • [46] Alleviating Dataset Constraints through Synthetic Data Generation in Machine Learning Driven Power Modeling
    Ali, Mohammad
    Qasem, Apan
    2024 IEEE 15TH INTERNATIONAL GREEN AND SUSTAINABLE COMPUTING CONFERENCE, IGSC 2024, 2024, : 52 - 58
  • [47] Diagnosis of Parkinson Disease Using Machine Learning and Data Mining Systems from Voice Dataset
    Sriram, Tarigoppula V. S.
    Rao, M. Venkateswara
    Narayana, G. V. Satya
    Kaladhar, D. S. V. G. K.
    PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON FRONTIERS OF INTELLIGENT COMPUTING: THEORY AND APPLICATIONS (FICTA) 2014, VOL 1, 2015, 327 : 151 - 157
  • [48] Training data selection based on dataset distillation for rapid deployment in machine-learning workflows
    Yuna Jeong
    Myunggwon Hwang
    Wonkyung Sung
    Multimedia Tools and Applications, 2023, 82 : 9855 - 9870
  • [49] Data-Centric Machine Learning: Improving Model Performance and Understanding Through Dataset Analysis
    Westermann, Hannes
    Savelka, Jaromir
    Walker, Vern R.
    Ashley, Kevin D.
    Benyekhlef, Karim
    LEGAL KNOWLEDGE AND INFORMATION SYSTEMS, 2021, 346 : 54 - 57
  • [50] Modeling urban mobility with machine learning analysis of public taxi transportation data
    Song, Ha Yoon
    You, Dabin
    INTERNATIONAL JOURNAL OF PERVASIVE COMPUTING AND COMMUNICATIONS, 2018, 14 (01) : 73 - 87