Crowdsourcing Detection of Sampling Biases in Image Datasets

被引:14
|
作者
Hu, Xiao [1 ]
Wang, Haobo [1 ]
Vegesana, Anirudh [1 ]
Dube, Somesh [1 ]
Yu, Kaiwen [1 ]
Kao, Gore [1 ]
Chen, Shuo-Han [1 ,2 ]
Lu, Yung-Hsiang [1 ]
Thiruvathukal, George K. [1 ,3 ]
Yin, Ming [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Acad Sinica, Taipei, Taiwan
[3] Loyola Univ, New Orleans, LA 70118 USA
来源
WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) | 2020年
关键词
sampling bias; crowdsourcing; image dataset; workflow design;
D O I
10.1145/3366423.3380063
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite many exciting innovations in computer vision, recent studies reveal a number of risks in existing computer vision systems, suggesting results of such systems may be unfair and untrustworthy. Many of these risks can be partly attributed to the use of a training image dataset that exhibits sampling biases and thus does not accurately reflect the real visual world. Being able to detect potential sampling biases in the visual dataset prior to model development is thus essential for mitigating the fairness and trustworthy concerns in computer vision. In this paper, we propose a three-step crowd-sourcing workflow to get humans into the loop for facilitating bias discovery in image datasets. Through two sets of evaluation studies, we find that the proposed workflow can effectively organize the crowd to detect sampling biases in both datasets that are artificially created with designed biases and real-world image datasets that are widely used in computer vision research and system development.
引用
收藏
页码:2955 / 2961
页数:7
相关论文
共 50 条
  • [41] Sampling biases in analyzes of prescription durations
    Stovring, Henrik
    Pottegard, Anton
    Hallas, Jesper
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2022, 31 : 3 - 4
  • [42] Sampling biases in IP topology measurements
    Lakhina, A
    Byers, JW
    Crovella, M
    Xie, P
    IEEE INFOCOM 2003: THE CONFERENCE ON COMPUTER COMMUNICATIONS, VOLS 1-3, PROCEEDINGS, 2003, : 332 - 341
  • [43] SAMPLING BIASES IN STUDIES OF GENDER AND SCHIZOPHRENIA
    WALKER, EF
    LEWINE, RRJ
    SCHIZOPHRENIA BULLETIN, 1993, 19 (01) : 1 - 7
  • [44] Sampling characteristics and biases of enclosure traps for sampling fishes in estuaries
    Mark A. Steele
    Stephen C. Schroeter
    Henry M. Page
    Estuaries and Coasts, 2006, 29 : 630 - 638
  • [45] Sampling characteristics and biases of enclosure traps for sampling fishes in estuaries
    Steele, Mark A.
    Schroeter, Stephen C.
    Page, Henry M.
    ESTUARIES AND COASTS, 2006, 29 (04) : 630 - 638
  • [46] SEMANTIC CONCEPT DETECTION IN IMBALANCED DATASETS BASED ON DIFFERENT UNDER-SAMPLING STRATEGIES
    Guo, Jinlin
    Foley, Colum
    Gurrin, Cathal
    Lao, Songyang
    2011 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2011,
  • [47] Vehicle image datasets for image classification
    Boonsirisumpun, Narong
    Okafor, Emmanuel
    Surinta, Olarik
    DATA IN BRIEF, 2024, 53
  • [48] QMC Sampling from Empirical Datasets
    Xie, Fei
    Giles, Michael B.
    He, Zhijian
    MONTE CARLO AND QUASI-MONTE CARLO METHODS, MCQMC 2018, 2020, 324 : 523 - 539
  • [49] Finding the Critical Sampling of Big Datasets
    Silva, Jose
    Ribeiro, Bernardete
    Sung, Andrew H.
    ACM INTERNATIONAL CONFERENCE ON COMPUTING FRONTIERS 2017, 2017, : 355 - 360
  • [50] A comprehensive review of optical remote-sensing image object detection datasets
    Yuan Y.
    Li L.
    Yao X.
    Li L.
    Feng X.
    Cheng G.
    Han J.
    National Remote Sensing Bulletin, 2023, 27 (12) : 2671 - 2687