Multi-Class Ground Truth Inference in Crowdsourcing with Clustering

被引:87
|
作者
Zhang, Jing [1 ]
Sheng, Victor S. [2 ]
Wu, Jian [3 ]
Wu, Xindong [4 ,5 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Dept Software Engn, 200 Xiaolingwei St, Nanjing 210094, Jiangsu, Peoples R China
[2] Univ Cent Arkansas, Dept Comp Sci, Conway, AR 72035 USA
[3] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
[4] Hefei Univ Technol, Sch Comp Sci & Informat Engn, 193 Tunxi Rd, Hefei 230009, Peoples R China
[5] Univ Vermont, Dept Comp Sci, 33 Colchester Ave, Burlington, VT 05405 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Clustering; crowdsourcing; EM algorithm; ground truth inference; multi-class labeling;
D O I
10.1109/TKDE.2015.2504974
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to low quality of crowdsourced labelers, the integrated label of each example is usually inferred from its multiple noisy labels provided by different labelers. This paper proposes a novel algorithm, Ground Truth Inference using Clustering (GTIC), to improve the quality of integrated labels for multi-class labeling. For a K labeling case, GTIC utilizes the multiple noisy label sets of examples to generate features. Then, it uses a K-Means algorithm to cluster all examples into K different groups, each of which is mapped to a specific class. Examples in the same cluster are assigned a corresponding class label. We compare GTIC with four existing multi-class ground truth inference algorithms, majority voting (MV), Dawid & Skene's (DS), ZenCrowd (ZC) and Spectral DS (SDS), on one synthetic and eight real-world datasets. Experimental results show that the performance of GTIC is significantly superior to the others in terms of both accuracy and M-AUC. Besides, the running time of GTIC is about twenty times faster than EM-based complicated inference algorithms.
引用
收藏
页码:1080 / 1085
页数:6
相关论文
共 50 条
  • [1] Crowdsourcing Truth Inference Based on Label Confidence Clustering
    Wu, Gongqing
    Zhou, Liangzhu
    Xia, Jiazhu
    Li, Lei
    Bao, Xianyu
    Wu, Xindong
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2023, 17 (04)
  • [2] Sequential Multi-Class Labeling in Crowdsourcing
    Kang, Qiyu
    Tay, Wee Peng
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2019, 31 (11) : 2190 - 2199
  • [3] Label Consistency-Based Ground Truth Inference for Crowdsourcing
    Li, Jiao
    Jiang, Liangxiao
    Zhang, Wenjun
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [4] Collusion detection and ground truth inference in crowdsourcing for labeling tasks
    Song, Changyue
    Liu, Kaibo
    Zhang, Xi
    Journal of Machine Learning Research, 2021, 22
  • [5] Multi-Factor Influencing Truth Inference in Crowdsourcing
    Zhang, Guangyuan
    Wang, Ning
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2021, 37 (05) : 1231 - 1246
  • [6] MULTI-CLASS INFERENCE WITH GAUSSIAN PROCESSES
    Cseke, Botond
    Csato, Lehel
    STUDIA UNIVERSITATIS BABES-BOLYAI MATHEMATICA, 2005, 50 (03): : 81 - 96
  • [7] Leveraging Label Category Relationships in Multi-class Crowdsourcing
    Jin, Yuan
    Du, Lan
    Zhu, Ye
    Carman, Mark
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2018, PT II, 2018, 10938 : 128 - 140
  • [8] Reliable Crowdsourcing for Multi-Class Labeling Using Coding Theory
    Vempaty, Aditya
    Varshney, Lav R.
    Varshney, Pramod K.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2014, 8 (04) : 667 - 679
  • [9] A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning
    Ben Ma
    Chaoqun Li
    Liangxiao Jiang
    Applied Intelligence, 2022, 52 : 17784 - 17796
  • [10] A novel ground truth inference algorithm based on instance similarity for crowdsourcing learning
    Ma, Ben
    Li, Chaoqun
    Jiang, Liangxiao
    APPLIED INTELLIGENCE, 2022, 52 (15) : 17784 - 17796