Crowdsourcing Detection of Sampling Biases in Image Datasets

被引：14

作者：

Hu, Xiao ^{[1
]}

Wang, Haobo ^{[1
]}

Vegesana, Anirudh ^{[1
]}

Dube, Somesh ^{[1
]}

Yu, Kaiwen ^{[1
]}

Kao, Gore ^{[1
]}

Chen, Shuo-Han ^{[1
,2
]}

Lu, Yung-Hsiang ^{[1
]}

Thiruvathukal, George K. ^{[1
,3
]}

Yin, Ming ^{[1
]}

机构：

[1] Purdue Univ, W Lafayette, IN 47907 USA

[2] Acad Sinica, Taipei, Taiwan

[3] Loyola Univ, New Orleans, LA 70118 USA

来源：

WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) | 2020年

关键词：

sampling bias; crowdsourcing; image dataset; workflow design;

D O I：

10.1145/3366423.3380063

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Despite many exciting innovations in computer vision, recent studies reveal a number of risks in existing computer vision systems, suggesting results of such systems may be unfair and untrustworthy. Many of these risks can be partly attributed to the use of a training image dataset that exhibits sampling biases and thus does not accurately reflect the real visual world. Being able to detect potential sampling biases in the visual dataset prior to model development is thus essential for mitigating the fairness and trustworthy concerns in computer vision. In this paper, we propose a three-step crowd-sourcing workflow to get humans into the loop for facilitating bias discovery in image datasets. Through two sets of evaluation studies, we find that the proposed workflow can effectively organize the crowd to detect sampling biases in both datasets that are artificially created with designed biases and real-world image datasets that are widely used in computer vision research and system development.

引用

页码：2955 / 2961

页数：7

共 50 条

[1] Estimating sampling biases in citizen science datasets
Backstrom, Louis J.
Callaghan, Corey T.
Worthington, Hannah
Fuller, Richard A.
Johnston, Alison
IBIS, 2025, 167 (01) : 73 - 87
[2] Hidden Biases in Unreliable News Detection Datasets
Zhou, Xiang
Elfardy, Heba
Christodoulopoulos, Christos
Butler, Thomas
Bansal, Mohit
16TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EACL 2021), 2021, : 2482 - 2492
[3] Cognitive Biases in Crowdsourcing
Eickhoff, Carsten
WSDM'18: PROCEEDINGS OF THE ELEVENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2018, : 162 - 170
[4] Sampling Biases in Datasets of Historical Mean Air Temperature over Land
Kaicun Wang
Scientific Reports, 4
[5] Sampling Biases in Datasets of Historical Mean Air Temperature over Land
Wang, Kaicun
SCIENTIFIC REPORTS, 2014, 4
[6] Balanced Sampling Meets Imbalanced Datasets in SAR Image Classification
Jahan, Chowdhury Sadman
Savakis, Andreas
GEOSPATIAL INFORMATICS XIII, 2023, 12525
[7] A Collaborative Training Using Crowdsourcing and Neural Networks on Small and Difficult Image Classification Datasets
Tomoumi Takase
SN Computer Science, 2022, 3 (2)
[8] Qualification and quantification on viewpoint biases in large scale image datasets for general object recognition
Qiu Y.
Satoh Y.
Suzuki R.
Kataoka H.
Iwata K.
Seimitsu Kogaku Kaishi/Journal of the Japan Society for Precision Engineering, 2019, 85 (12): : 1087 - 1093
[9] Investigation of Biases in Identity Linkage DataSets
Kaushal, Rishabh
Gupta, Shubham
Kumaraguru, Ponnurangam
PROCEEDINGS OF THE 35TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING (SAC'20), 2020, : 1861 - 1868
[10] Image Anomaly Detection with Capsule Networks and Imbalanced Datasets
Piciarelli, Claudio
Mishra, Pankaj
Foresti, Gian Luca
IMAGE ANALYSIS AND PROCESSING - ICIAP 2019, PT I, 2019, 11751 : 257 - 267

← 1 2 3 4 5 →