Learning from crowds with decision trees

被引:14
|
作者
Yang, Wenjun [1 ]
Li, Chaoqun [1 ]
Jiang, Liangxiao [2 ]
机构
[1] China Univ Geosci, Sch Math & Phys, Wuhan 430074, Peoples R China
[2] China Univ Geosci, Sch Comp Sci, Wuhan 430074, Peoples R China
关键词
Crowdsourcing learning; Weighted majority voting; Decision trees; MODEL QUALITY; STATISTICAL COMPARISONS; WEIGHTING FILTER; IMPROVING DATA; CLASSIFIERS; TOOL;
D O I
10.1007/s10115-022-01701-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Crowdsourcing systems provide an efficient way to collect labeled data by employing non-expert crowd workers. In practice, each instance obtains a multiple noisy label set from different workers. Ground truth inference algorithms are designed to infer the unknown true labels of data from multiple noisy label sets. Since there is substantial variation among different workers, evaluating the qualities of workers is crucial for ground truth inference. This paper proposes a novel algorithm called decision tree-based weighted majority voting (DTWMV). DTWMV directly takes the multiple noisy label set of each instance as its feature vector; that is, each worker is a feature of instances. Then sequential decision trees are built to calculate the weight of each feature (worker). Finally weighted majority voting is used to infer the integrated labels of instances. In DTWMV, evaluating the qualities of workers is converted to calculating the weights of features, which provides a new perspective for solving the ground truth inference problem. Then, a novel feature weight measurement based on decision trees is proposed. Our experimental results show that DTWMV can effectively evaluate the qualities of workers and improve the label quality of data.
引用
收藏
页码:2123 / 2140
页数:18
相关论文
共 50 条
  • [41] Tensor decision trees for continual learning from drifting data streams
    Bartosz Krawczyk
    Machine Learning, 2021, 110 : 3015 - 3035
  • [42] Tensor Decision Trees for Continual Learning from Drifting Data Streams
    Krawczyk, Bartosz
    2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
  • [43] Word sense disambiguation by learning decision trees from unlabeled data
    Park, SB
    Zhang, BT
    Kim, YT
    APPLIED INTELLIGENCE, 2003, 19 (1-2) : 27 - 38
  • [44] Learning decision trees from uncertain data with an evidential EM approach
    Sutton-Charani, Nicolas
    Destercke, Sebastien
    Denoeux, Thierry
    2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 1, 2013, : 111 - 116
  • [45] Tensor decision trees for continual learning from drifting data streams
    Krawczyk, Bartosz
    MACHINE LEARNING, 2021, 110 (11-12) : 3015 - 3035
  • [46] Learning from crowds with sparse and imbalanced annotations
    Ye Shi
    Shao-Yuan Li
    Sheng-Jun Huang
    Machine Learning, 2023, 112 : 1823 - 1845
  • [47] Multi-Label Learning from Crowds
    Li, Shao-Yuan
    Jiang, Yuan
    Chawla, Nitesh V.
    Zhou, Zhi-Hua
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (07) : 1369 - 1382
  • [48] Collective annotation patterns in learning from crowds
    Mena, Francisco
    Nanculef, Ricardo
    Valle, Carlos
    INTELLIGENT DATA ANALYSIS, 2020, 24 (S1) : S63 - S86
  • [49] Selective Verification Strategy for Learning from Crowds
    Tian, Tian
    Zhou, Yichi
    Zhu, Jun
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 4147 - 4154
  • [50] Learning from crowds with robust logistic regression
    Li, Wenbin
    Li, Chaoqun
    Jiang, Liangxiao
    INFORMATION SCIENCES, 2023, 639