Learning from crowds with sparse and imbalanced annotations

被引:0
|
作者
Ye Shi
Shao-Yuan Li
Sheng-Jun Huang
机构
[1] Nanjing University of Aeronautics and Astronautics,College of Computer Science and Technology
来源
Machine Learning | 2023年 / 112卷
关键词
Crowdsourcing; Sparse annotations; Class-imbalance; Self-training;
D O I
暂无
中图分类号
学科分类号
摘要
Traditional supervised learning requires ground truth labels for training, whose collection however is difficult in many cases. Recently, crowdsourcing has established itself as an efficient labeling solution by resorting to non-expert crowds. To reduce the labeling error effects, one common practice is to distribute each instance to multiple workers, whereas each worker only annotates a subset of data, resulting in the sparse annotation phenomenon. In this paper, we show that when meeting with class-imbalance, i.e., even when the groundtruth labels are slightly imbalanced, the sparse annotations are prone to be skewly distributed and would bias the learning algorithm severely. To combat this issue, we propose one Distribution Aware Self-training based Crowdsourcing learning (DASC) approach, which supplements the sparse annotations by adding confident pseudo-annotations and at the same time re-balancing the annotation distribution. Specifically, we propose one distribution aware confidence measure to select the most confident pseudo-annotations, with minority/majority classes selected more/less frequently. As a universal framework, DASC is applicable to various crowdsourcing methods for consistent performance gains. We conduct extensive experiments over real-world crowdsourcing benchmarks, from slight to heavy imbalance ratio, with various annotation sparsity levels, and show that DASC substantially improves previous crowdsourcing models by 2%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2\%$$\end{document}-20%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20\%$$\end{document} absolute test accuracy, and yields much more balanced annotations.
引用
收藏
页码:1823 / 1845
页数:22
相关论文
共 50 条
  • [21] Learning to Simulate Crowds with Crowds
    Talukdar, Bilas
    Zhang, Yunhao
    Weiss, Tomer
    PROCEEDINGS OF SIGGRAPH 2023 POSTERS, SIGGRAPH 2023, 2023,
  • [22] Learning From Crowds With Contrastive Representation
    Yang, Hang
    Li, Xunbo
    Pedrycz, Witold
    IEEE ACCESS, 2023, 11 : 40182 - 40191
  • [23] Listwise Learning to Rank from Crowds
    Wu, Ou
    You, Qiang
    Xia, Fen
    Ma, Lei
    Hu, Weiming
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2016, 11 (01)
  • [24] Learning from Imbalanced Datasets: The Bike-Sharing Inventory Problem Using Sparse Information
    Ceccarelli, Giovanni
    Cantelmo, Guido
    Nigro, Marialisa
    Antoniou, Constantinos
    ALGORITHMS, 2023, 16 (07)
  • [25] On Learning From Game Annotations
    Wirth, Christian
    Fuernkranz, Johannes
    IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2015, 7 (03) : 304 - 316
  • [26] Learning from crowds with decision trees
    Wenjun Yang
    Chaoqun Li
    Liangxiao Jiang
    Knowledge and Information Systems, 2022, 64 : 2123 - 2140
  • [27] Learning from crowds with decision trees
    Yang, Wenjun
    Li, Chaoqun
    Jiang, Liangxiao
    KNOWLEDGE AND INFORMATION SYSTEMS, 2022, 64 (08) : 2123 - 2140
  • [28] Weighted Adversarial Learning From Crowds
    Chen, Ziqi
    Jiang, Liangxiao
    Zhang, Wenjun
    Li, Chaoqun
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (06) : 4467 - 4480
  • [29] Batch Reinforcement Learning from Crowds
    Zhang, Guoxi
    Kashima, Hisashi
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 38 - 51
  • [30] Learning from Crowds with Annotation Reliability
    Cao, Zhi
    Chen, Enhong
    Huang, Ye
    Shen, Shuanghong
    Huang, Zhenya
    PROCEEDINGS OF THE 46TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, SIGIR 2023, 2023, : 2103 - 2107