Learning from crowds with sparse and imbalanced annotations

被引:0
|
作者
Ye Shi
Shao-Yuan Li
Sheng-Jun Huang
机构
[1] Nanjing University of Aeronautics and Astronautics,College of Computer Science and Technology
来源
Machine Learning | 2023年 / 112卷
关键词
Crowdsourcing; Sparse annotations; Class-imbalance; Self-training;
D O I
暂无
中图分类号
学科分类号
摘要
Traditional supervised learning requires ground truth labels for training, whose collection however is difficult in many cases. Recently, crowdsourcing has established itself as an efficient labeling solution by resorting to non-expert crowds. To reduce the labeling error effects, one common practice is to distribute each instance to multiple workers, whereas each worker only annotates a subset of data, resulting in the sparse annotation phenomenon. In this paper, we show that when meeting with class-imbalance, i.e., even when the groundtruth labels are slightly imbalanced, the sparse annotations are prone to be skewly distributed and would bias the learning algorithm severely. To combat this issue, we propose one Distribution Aware Self-training based Crowdsourcing learning (DASC) approach, which supplements the sparse annotations by adding confident pseudo-annotations and at the same time re-balancing the annotation distribution. Specifically, we propose one distribution aware confidence measure to select the most confident pseudo-annotations, with minority/majority classes selected more/less frequently. As a universal framework, DASC is applicable to various crowdsourcing methods for consistent performance gains. We conduct extensive experiments over real-world crowdsourcing benchmarks, from slight to heavy imbalance ratio, with various annotation sparsity levels, and show that DASC substantially improves previous crowdsourcing models by 2%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2\%$$\end{document}-20%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20\%$$\end{document} absolute test accuracy, and yields much more balanced annotations.
引用
收藏
页码:1823 / 1845
页数:22
相关论文
共 50 条
  • [31] Learning from Imbalanced Data
    He, Haibo
    Garcia, Edwardo A.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) : 1263 - 1284
  • [32] Learning three-dimensional aortic root assessment based on sparse annotations
    Brosig, Johanna
    Krueger, Nina
    Khasyanova, Inna
    Wamala, Isaac
    Ivantsits, Matthias
    Suendermann, Simon
    Kempfert, Jorg
    Heldmann, Stefan
    Hennemuth, Anja
    JOURNAL OF MEDICAL IMAGING, 2024, 11 (04)
  • [33] Complex Emotion Profiling: An Incremental Active Learning Based Approach With Sparse Annotations
    Thuseethan, Selvarajah
    Rajasegarar, Sutharshan
    Yearwood, John
    IEEE ACCESS, 2020, 8 : 147711 - 147727
  • [34] Learning 3D aortic root assessment based on sparse annotations
    Brosig, Johanna
    Krueger, Nina
    Wamala, Isaac
    Ivantsits, Matthias
    Sundermann, Simon
    Kempfert, Joerg
    Heldmann, Stefan
    Hennemuth, Anja
    COMPUTER-AIDED DIAGNOSIS, MEDICAL IMAGING 2024, 2024, 12927
  • [35] Multi-Label Learning from Crowds
    Li, Shao-Yuan
    Jiang, Yuan
    Chawla, Nitesh V.
    Zhou, Zhi-Hua
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (07) : 1369 - 1382
  • [36] Collective annotation patterns in learning from crowds
    Mena, Francisco
    Nanculef, Ricardo
    Valle, Carlos
    INTELLIGENT DATA ANALYSIS, 2020, 24 (S1) : S63 - S86
  • [37] Selective Verification Strategy for Learning from Crowds
    Tian, Tian
    Zhou, Yichi
    Zhu, Jun
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4147 - 4154
  • [38] Learning from crowds with robust logistic regression
    Li, Wenbin
    Li, Chaoqun
    Jiang, Liangxiao
    INFORMATION SCIENCES, 2023, 639
  • [39] Label Selection Approach to Learning from Crowds
    Yoshimura, Kosuke
    Kashima, Hisashi
    Transactions of the Japanese Society for Artificial Intelligence, 2024, 39 (05)
  • [40] Learning from crowds with variational Gaussian processes
    Ruiz, Pablo
    Morales-Alvarez, Pablo
    Molina, Rafael
    Katsaggelos, Aggelos K.
    PATTERN RECOGNITION, 2019, 88 : 298 - 311