Crowdsourcing Detection of Sampling Biases in Image Datasets

被引:14
|
作者
Hu, Xiao [1 ]
Wang, Haobo [1 ]
Vegesana, Anirudh [1 ]
Dube, Somesh [1 ]
Yu, Kaiwen [1 ]
Kao, Gore [1 ]
Chen, Shuo-Han [1 ,2 ]
Lu, Yung-Hsiang [1 ]
Thiruvathukal, George K. [1 ,3 ]
Yin, Ming [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Acad Sinica, Taipei, Taiwan
[3] Loyola Univ, New Orleans, LA 70118 USA
来源
WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) | 2020年
关键词
sampling bias; crowdsourcing; image dataset; workflow design;
D O I
10.1145/3366423.3380063
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite many exciting innovations in computer vision, recent studies reveal a number of risks in existing computer vision systems, suggesting results of such systems may be unfair and untrustworthy. Many of these risks can be partly attributed to the use of a training image dataset that exhibits sampling biases and thus does not accurately reflect the real visual world. Being able to detect potential sampling biases in the visual dataset prior to model development is thus essential for mitigating the fairness and trustworthy concerns in computer vision. In this paper, we propose a three-step crowd-sourcing workflow to get humans into the loop for facilitating bias discovery in image datasets. Through two sets of evaluation studies, we find that the proposed workflow can effectively organize the crowd to detect sampling biases in both datasets that are artificially created with designed biases and real-world image datasets that are widely used in computer vision research and system development.
引用
收藏
页码:2955 / 2961
页数:7
相关论文
共 50 条
  • [1] Estimating sampling biases in citizen science datasets
    Backstrom, Louis J.
    Callaghan, Corey T.
    Worthington, Hannah
    Fuller, Richard A.
    Johnston, Alison
    IBIS, 2025, 167 (01) : 73 - 87
  • [2] Hidden Biases in Unreliable News Detection Datasets
    Zhou, Xiang
    Elfardy, Heba
    Christodoulopoulos, Christos
    Butler, Thomas
    Bansal, Mohit
    16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2482 - 2492
  • [3] Cognitive Biases in Crowdsourcing
    Eickhoff, Carsten
    WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2018, : 162 - 170
  • [4] Sampling Biases in Datasets of Historical Mean Air Temperature over Land
    Kaicun Wang
    Scientific Reports, 4
  • [5] Sampling Biases in Datasets of Historical Mean Air Temperature over Land
    Wang, Kaicun
    SCIENTIFIC REPORTS, 2014, 4
  • [6] Balanced Sampling Meets Imbalanced Datasets in SAR Image Classification
    Jahan, Chowdhury Sadman
    Savakis, Andreas
    GEOSPATIAL INFORMATICS XIII, 2023, 12525
  • [7] A Collaborative Training Using Crowdsourcing and Neural Networks on Small and Difficult Image Classification Datasets
    Tomoumi Takase
    SN Computer Science, 2022, 3 (2)
  • [8] Qualification and quantification on viewpoint biases in large scale image datasets for general object recognition
    Qiu Y.
    Satoh Y.
    Suzuki R.
    Kataoka H.
    Iwata K.
    Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2019, 85 (12): : 1087 - 1093
  • [9] Investigation of Biases in Identity Linkage DataSets
    Kaushal, Rishabh
    Gupta, Shubham
    Kumaraguru, Ponnurangam
    PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 1861 - 1868
  • [10] Image Anomaly Detection with Capsule Networks and Imbalanced Datasets
    Piciarelli, Claudio
    Mishra, Pankaj
    Foresti, Gian Luca
    IMAGE ANALYSIS AND PROCESSING - ICIAP 2019, PT I, 2019, 11751 : 257 - 267