Learning from crowds with sparse and imbalanced annotations

被引:0
|
作者
Ye Shi
Shao-Yuan Li
Sheng-Jun Huang
机构
[1] Nanjing University of Aeronautics and Astronautics,College of Computer Science and Technology
来源
Machine Learning | 2023年 / 112卷
关键词
Crowdsourcing; Sparse annotations; Class-imbalance; Self-training;
D O I
暂无
中图分类号
学科分类号
摘要
Traditional supervised learning requires ground truth labels for training, whose collection however is difficult in many cases. Recently, crowdsourcing has established itself as an efficient labeling solution by resorting to non-expert crowds. To reduce the labeling error effects, one common practice is to distribute each instance to multiple workers, whereas each worker only annotates a subset of data, resulting in the sparse annotation phenomenon. In this paper, we show that when meeting with class-imbalance, i.e., even when the groundtruth labels are slightly imbalanced, the sparse annotations are prone to be skewly distributed and would bias the learning algorithm severely. To combat this issue, we propose one Distribution Aware Self-training based Crowdsourcing learning (DASC) approach, which supplements the sparse annotations by adding confident pseudo-annotations and at the same time re-balancing the annotation distribution. Specifically, we propose one distribution aware confidence measure to select the most confident pseudo-annotations, with minority/majority classes selected more/less frequently. As a universal framework, DASC is applicable to various crowdsourcing methods for consistent performance gains. We conduct extensive experiments over real-world crowdsourcing benchmarks, from slight to heavy imbalance ratio, with various annotation sparsity levels, and show that DASC substantially improves previous crowdsourcing models by 2%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2\%$$\end{document}-20%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20\%$$\end{document} absolute test accuracy, and yields much more balanced annotations.
引用
收藏
页码:1823 / 1845
页数:22
相关论文
共 50 条
  • [41] Active Learning from Crowds with Unsure Option
    Zhong, Jinhong
    Tang, Ke
    Zhou, Zhi-Hua
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 1061 - 1067
  • [42] Active Learning for Text Mining from Crowds
    Shao, Hao
    ADVANCES IN ARTIFICIAL INTELLIGENCE: FROM THEORY TO PRACTICE (IEA/AIE 2017), PT II, 2017, 10351 : 409 - 418
  • [43] Learning from Crowds by Modeling Common Confusions
    Chu, Zhendong
    Ma, Jing
    Wang, Hongning
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 5832 - 5840
  • [44] Learning from crowds with active learning and self-healing
    Shu, Zhenyu
    Sheng, Victor S.
    Li, Jingjing
    NEURAL COMPUTING & APPLICATIONS, 2018, 30 (09): : 2883 - 2894
  • [45] Learning from crowds with active learning and self-healing
    Zhenyu Shu
    Victor S. Sheng
    Jingjing Li
    Neural Computing and Applications, 2018, 30 : 2883 - 2894
  • [46] Cost-sensitive sparse group online learning for imbalanced data streams
    Chen, Zhong
    Sheng, Victor
    Edwards, Andrea
    Zhang, Kun
    MACHINE LEARNING, 2024, 113 (07) : 4407 - 4444
  • [47] Metric Learning from Imbalanced Data
    Gautheron, Leo
    Habrard, Amaury
    Morvant, Emilie
    Sebban, Marc
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 923 - 930
  • [48] SMILE: Cost-sensitive multi-task learning for nuclear segmentation and classification with imbalanced annotations
    Pan, Xipeng
    Cheng, Jijun
    Hou, Feihu
    Lan, Rushi
    Lu, Cheng
    Li, Lingqiao
    Feng, Zhengyun
    Wang, Huadeng
    Liang, Changhong
    Liu, Zhenbing
    Chen, Xin
    Han, Chu
    Liu, Zaiyi
    MEDICAL IMAGE ANALYSIS, 2023, 88
  • [49] Imbalanced Ensemble Classifier for Learning from Imbalanced Business School Dataset
    Chakraborty, Tanujit
    INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2019, 4 (04) : 861 - 869
  • [50] Cardiac MRI segmentation with sparse annotations: Ensembling deep learning uncertainty and shape priors
    Guo, Fumin
    Ng, Matthew
    Kuling, Grey
    Wright, Graham
    MEDICAL IMAGE ANALYSIS, 2022, 81