Crowdsourcing Detection of Sampling Biases in Image Datasets

被引:14
|
作者
Hu, Xiao [1 ]
Wang, Haobo [1 ]
Vegesana, Anirudh [1 ]
Dube, Somesh [1 ]
Yu, Kaiwen [1 ]
Kao, Gore [1 ]
Chen, Shuo-Han [1 ,2 ]
Lu, Yung-Hsiang [1 ]
Thiruvathukal, George K. [1 ,3 ]
Yin, Ming [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Acad Sinica, Taipei, Taiwan
[3] Loyola Univ, New Orleans, LA 70118 USA
来源
WEB CONFERENCE 2020: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2020) | 2020年
关键词
sampling bias; crowdsourcing; image dataset; workflow design;
D O I
10.1145/3366423.3380063
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Despite many exciting innovations in computer vision, recent studies reveal a number of risks in existing computer vision systems, suggesting results of such systems may be unfair and untrustworthy. Many of these risks can be partly attributed to the use of a training image dataset that exhibits sampling biases and thus does not accurately reflect the real visual world. Being able to detect potential sampling biases in the visual dataset prior to model development is thus essential for mitigating the fairness and trustworthy concerns in computer vision. In this paper, we propose a three-step crowd-sourcing workflow to get humans into the loop for facilitating bias discovery in image datasets. Through two sets of evaluation studies, we find that the proposed workflow can effectively organize the crowd to detect sampling biases in both datasets that are artificially created with designed biases and real-world image datasets that are widely used in computer vision research and system development.
引用
收藏
页码:2955 / 2961
页数:7
相关论文
共 50 条
  • [31] Image concept detection in imbalanced datasets with ensemble of convolutional neural networks
    Bahrami, Maryam
    Sajedi, Hedieh
    INTELLIGENT DATA ANALYSIS, 2019, 23 (05) : 1131 - 1144
  • [32] Analysis of biases in automatic white balance datasets and methods
    Buzzelli, Marco
    Zini, Simone
    Bianco, Simone
    Ciocca, Gianluigi
    Schettini, Raimondo
    Tchobanou, Mikhail K.
    COLOR RESEARCH AND APPLICATION, 2023, 48 (01): : 40 - 62
  • [33] IMAGE-ENHANCEMENT BY TRACKING AND SAMPLING IN THE DETECTION PLANE
    REINHOLZ, F
    WILSON, T
    OPTIK, 1994, 96 (02): : 59 - 64
  • [34] Automatic Detection of Galaxy Type From Datasets of Galaxies Image Based on Image Retrieval Approach
    Abd El Aziz, Mohamed
    Selim, I. M.
    Xiong, Shengwu
    SCIENTIFIC REPORTS, 2017, 7
  • [35] Language of Mechanisation Crowdsourcing Datasets from the Living with Machines Project
    Ridge, Mia
    Pedrazzini, Nilo
    Vieira, Miguel
    Ciula, Arianna
    Mcgillivray, Barbara
    JOURNAL OF OPEN HUMANITIES DATA, 2024, 10
  • [36] Approximate Detection Method for Image Up-Sampling
    Tu, Ching-Ting
    Lin, Hwei-Jen
    Yang, Fu-Wen
    Chang, Hsiao-Wei
    KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2014, 8 (02): : 462 - 482
  • [37] Automatic Detection of Galaxy Type From Datasets of Galaxies Image Based on Image Retrieval Approach
    Mohamed Abd El Aziz
    I. M. Selim
    Shengwu Xiong
    Scientific Reports, 7
  • [38] Differentially Private Location Protection for Worker Datasets in Spatial Crowdsourcing
    To, Hien
    Ghinita, Gabriel
    Fan, Liyue
    Shahabi, Cyrus
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2017, 16 (04) : 934 - 949
  • [39] Sampling of temporal networks: Methods and biases
    Rocha, Luis E. C.
    Masuda, Naoki
    Holme, Petter
    PHYSICAL REVIEW E, 2017, 96 (05)
  • [40] THE ESTIMATION OF SAMPLING BIASES FOR MALE TSETSE
    ROGERS, DJ
    INSECT SCIENCE AND ITS APPLICATION, 1984, 5 (05): : 369 - 373