Learning from crowds with sparse and imbalanced annotations

被引：0

作者：

Ye Shi

Shao-Yuan Li

Sheng-Jun Huang

机构：

[1] Nanjing University of Aeronautics and Astronautics,College of Computer Science and Technology

来源：

Machine Learning | 2023年 / 112卷

关键词：

Crowdsourcing; Sparse annotations; Class-imbalance; Self-training;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Traditional supervised learning requires ground truth labels for training, whose collection however is difficult in many cases. Recently, crowdsourcing has established itself as an efficient labeling solution by resorting to non-expert crowds. To reduce the labeling error effects, one common practice is to distribute each instance to multiple workers, whereas each worker only annotates a subset of data, resulting in the sparse annotation phenomenon. In this paper, we show that when meeting with class-imbalance, i.e., even when the groundtruth labels are slightly imbalanced, the sparse annotations are prone to be skewly distributed and would bias the learning algorithm severely. To combat this issue, we propose one Distribution Aware Self-training based Crowdsourcing learning (DASC) approach, which supplements the sparse annotations by adding confident pseudo-annotations and at the same time re-balancing the annotation distribution. Specifically, we propose one distribution aware confidence measure to select the most confident pseudo-annotations, with minority/majority classes selected more/less frequently. As a universal framework, DASC is applicable to various crowdsourcing methods for consistent performance gains. We conduct extensive experiments over real-world crowdsourcing benchmarks, from slight to heavy imbalance ratio, with various annotation sparsity levels, and show that DASC substantially improves previous crowdsourcing models by 2%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$2\%$$\end{document}-20%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20\%$$\end{document} absolute test accuracy, and yields much more balanced annotations.

引用

页码：1823 / 1845

页数：22

共 50 条

[41] Active Learning from Crowds with Unsure Option
Zhong, Jinhong
Tang, Ke
Zhou, Zhi-Hua
PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 1061 - 1067
[42] Active Learning for Text Mining from Crowds
Shao, Hao
ADVANCES IN ARTIFICIAL INTELLIGENCE: FROM THEORY TO PRACTICE (IEA/AIE 2017), PT II, 2017, 10351 : 409 - 418
[43] Learning from Crowds by Modeling Common Confusions
Chu, Zhendong
Ma, Jing
Wang, Hongning
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 5832 - 5840
[44] Learning from crowds with active learning and self-healing
Shu, Zhenyu
Sheng, Victor S.
Li, Jingjing
NEURAL COMPUTING & APPLICATIONS, 2018, 30 (09): : 2883 - 2894
[45] Learning from crowds with active learning and self-healing
Zhenyu Shu
Victor S. Sheng
Jingjing Li
Neural Computing and Applications, 2018, 30 : 2883 - 2894
[46] Cost-sensitive sparse group online learning for imbalanced data streams
Chen, Zhong
Sheng, Victor
Edwards, Andrea
Zhang, Kun
MACHINE LEARNING, 2024, 113 (07) : 4407 - 4444
[47] Metric Learning from Imbalanced Data
Gautheron, Leo
Habrard, Amaury
Morvant, Emilie
Sebban, Marc
2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 923 - 930
[48] SMILE: Cost-sensitive multi-task learning for nuclear segmentation and classification with imbalanced annotations
Pan, Xipeng
Cheng, Jijun
Hou, Feihu
Lan, Rushi
Lu, Cheng
Li, Lingqiao
Feng, Zhengyun
Wang, Huadeng
Liang, Changhong
Liu, Zhenbing
Chen, Xin
Han, Chu
Liu, Zaiyi
MEDICAL IMAGE ANALYSIS, 2023, 88
[49] Imbalanced Ensemble Classifier for Learning from Imbalanced Business School Dataset
Chakraborty, Tanujit
INTERNATIONAL JOURNAL OF MATHEMATICAL ENGINEERING AND MANAGEMENT SCIENCES, 2019, 4 (04) : 861 - 869
[50] Cardiac MRI segmentation with sparse annotations: Ensembling deep learning uncertainty and shape priors
Guo, Fumin
Ng, Matthew
Kuling, Grey
Wright, Graham
MEDICAL IMAGE ANALYSIS, 2022, 81

← 1 2 3 4 5 →