Multi-Class Ground Truth Inference in Crowdsourcing with Clustering

被引:87
|
作者
Zhang, Jing [1 ]
Sheng, Victor S. [2 ]
Wu, Jian [3 ]
Wu, Xindong [4 ,5 ]
机构
[1] Nanjing Univ Sci & Technol, Sch Comp Sci & Engn, Dept Software Engn, 200 Xiaolingwei St, Nanjing 210094, Jiangsu, Peoples R China
[2] Univ Cent Arkansas, Dept Comp Sci, Conway, AR 72035 USA
[3] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215006, Peoples R China
[4] Hefei Univ Technol, Sch Comp Sci & Informat Engn, 193 Tunxi Rd, Hefei 230009, Peoples R China
[5] Univ Vermont, Dept Comp Sci, 33 Colchester Ave, Burlington, VT 05405 USA
基金
美国国家科学基金会; 中国国家自然科学基金;
关键词
Clustering; crowdsourcing; EM algorithm; ground truth inference; multi-class labeling;
D O I
10.1109/TKDE.2015.2504974
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to low quality of crowdsourced labelers, the integrated label of each example is usually inferred from its multiple noisy labels provided by different labelers. This paper proposes a novel algorithm, Ground Truth Inference using Clustering (GTIC), to improve the quality of integrated labels for multi-class labeling. For a K labeling case, GTIC utilizes the multiple noisy label sets of examples to generate features. Then, it uses a K-Means algorithm to cluster all examples into K different groups, each of which is mapped to a specific class. Examples in the same cluster are assigned a corresponding class label. We compare GTIC with four existing multi-class ground truth inference algorithms, majority voting (MV), Dawid & Skene's (DS), ZenCrowd (ZC) and Spectral DS (SDS), on one synthetic and eight real-world datasets. Experimental results show that the performance of GTIC is significantly superior to the others in terms of both accuracy and M-AUC. Besides, the running time of GTIC is about twenty times faster than EM-based complicated inference algorithms.
引用
收藏
页码:1080 / 1085
页数:6
相关论文
共 50 条
  • [21] Auxiliary variables for Bayesian inference in multi-class queueing networks
    Perez, Iker
    Hodge, David
    Kypraios, Theodore
    STATISTICS AND COMPUTING, 2018, 28 (06) : 1187 - 1200
  • [22] Sequential Multi-Class Labeling In Crowdsourcing: A Ulam-Renyi Game Approach
    Kang, Qiyu
    Tay, Wee Peng
    2017 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2017), 2017, : 245 - 251
  • [23] CATLINKS - A CATEGORY CLUSTERING ALGORITHM BASED ON MULTI-CLASS REGRESSION
    Liu, Rui
    Ding, Lixin
    Xie, Lan
    2014 IEEE 3RD INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2014, : 323 - 326
  • [24] SVM multi-class classification based on fuzzy kernel clustering
    Inst. of System Engineering, Dalian Univ. of Technology, Dalian 116024, China
    Xi Tong Cheng Yu Dian Zi Ji Shu/Syst Eng Electron, 2006, 5 (770-774):
  • [25] Clustering-Based Multi-Class Classification of Complex Disease
    Phongwattana, Thiptanawat
    Engchuan, Worrawat
    Chan, Jonathan H.
    2015 7TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST), 2015, : 25 - 29
  • [26] Ground truth clustering is not the optimum clustering
    Bautista, Lucia Absalom
    Hrga, Timotej
    Povh, Janez
    Zhao, Shudian
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [27] Crowdsourcing Ground Truth for Medical Relation Extraction
    Dumitrache, Anca
    Aroyo, Lora
    Welty, Chris
    ACM TRANSACTIONS ON INTERACTIVE INTELLIGENT SYSTEMS, 2018, 8 (02)
  • [28] FEDERATED TRUTH INFERENCE OVER DISTRIBUTED CROWDSOURCING PLATFORMS
    Yang, Ming-Hsun
    Liu, Gin-Hao
    Hong, Y-W Peter
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 5940 - 5944
  • [29] Crowdsourcing Truth Inference via Reliability-Driven Multi-View Graph Embedding
    Wu, Gongqing
    Zhuo, Xingrui
    Bao, Xianyu
    Hu, Xuegang
    Hong, Richang
    Wu, Xindong
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2023, 17 (05)
  • [30] Deep feature clustering for multi-class industrial image anomaly detection
    Wang, Rongxiang
    Li, Zhi
    Zheng, Long
    Wang, Weidong
    Li, Shuyun
    KNOWLEDGE-BASED SYSTEMS, 2025, 311