Learning from crowds with decision trees

被引：14

作者：

Yang, Wenjun ^{[1
]}

Li, Chaoqun ^{[1
]}

Jiang, Liangxiao ^{[2
]}

机构：

[1] China Univ Geosci, Sch Math & Phys, Wuhan 430074, Peoples R China

[2] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China

来源：

KNOWLEDGE AND INFORMATION SYSTEMS | 2022年 / 64卷 / 08期

关键词：

Crowdsourcing learning; Weighted majority voting; Decision trees; MODEL QUALITY; STATISTICAL COMPARISONS; WEIGHTING FILTER; IMPROVING DATA; CLASSIFIERS; TOOL;

D O I：

10.1007/s10115-022-01701-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Crowdsourcing systems provide an efficient way to collect labeled data by employing non-expert crowd workers. In practice, each instance obtains a multiple noisy label set from different workers. Ground truth inference algorithms are designed to infer the unknown true labels of data from multiple noisy label sets. Since there is substantial variation among different workers, evaluating the qualities of workers is crucial for ground truth inference. This paper proposes a novel algorithm called decision tree-based weighted majority voting (DTWMV). DTWMV directly takes the multiple noisy label set of each instance as its feature vector; that is, each worker is a feature of instances. Then sequential decision trees are built to calculate the weight of each feature (worker). Finally weighted majority voting is used to infer the integrated labels of instances. In DTWMV, evaluating the qualities of workers is converted to calculating the weights of features, which provides a new perspective for solving the ground truth inference problem. Then, a novel feature weight measurement based on decision trees is proposed. Our experimental results show that DTWMV can effectively evaluate the qualities of workers and improve the label quality of data.

引用

页码：2123 / 2140

页数：18

共 50 条

[41] Tensor decision trees for continual learning from drifting data streams
Bartosz Krawczyk
Machine Learning, 2021, 110 : 3015 - 3035
[42] Tensor Decision Trees for Continual Learning from Drifting Data Streams
Krawczyk, Bartosz
2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
[43] Word sense disambiguation by learning decision trees from unlabeled data
Park, SB
Zhang, BT
Kim, YT
APPLIED INTELLIGENCE, 2003, 19 (1-2) : 27 - 38
[44] Learning decision trees from uncertain data with an evidential EM approach
Sutton-Charani, Nicolas
Destercke, Sebastien
Denoeux, Thierry
2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 1, 2013, : 111 - 116
[45] Tensor decision trees for continual learning from drifting data streams
Krawczyk, Bartosz
MACHINE LEARNING, 2021, 110 (11-12) : 3015 - 3035
[46] Learning from crowds with sparse and imbalanced annotations
Ye Shi
Shao-Yuan Li
Sheng-Jun Huang
Machine Learning, 2023, 112 : 1823 - 1845
[47] Multi-Label Learning from Crowds
Li, Shao-Yuan
Jiang, Yuan
Chawla, Nitesh V.
Zhou, Zhi-Hua
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (07) : 1369 - 1382
[48] Collective annotation patterns in learning from crowds
Mena, Francisco
Nanculef, Ricardo
Valle, Carlos
INTELLIGENT DATA ANALYSIS, 2020, 24 (S1) : S63 - S86
[49] Selective Verification Strategy for Learning from Crowds
Tian, Tian
Zhou, Yichi
Zhu, Jun
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4147 - 4154
[50] Learning from crowds with robust logistic regression
Li, Wenbin
Li, Chaoqun
Jiang, Liangxiao
INFORMATION SCIENCES, 2023, 639

← 1 2 3 4 5 →