Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning

被引:96
|
作者
Mozafari, Barzan [1 ]
Sarkar, Purna [2 ]
Franklin, Michael [1 ,2 ]
Jordan, Michael [1 ]
Madden, Samuel [3 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] Univ Calif Berkeley, Berkeley, CA 94720 USA
[3] MIT, Cambridge, MA 02139 USA
来源
PROCEEDINGS OF THE VLDB ENDOWMENT | 2014年 / 8卷 / 02期
关键词
D O I
10.14778/2735471.2735474
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Crowd-sourcing has become a popular means of acquiring labeled data for many tasks where humans are more accurate than computers, such as image tagging, entity resolution, and sentiment analysis. I lowever, due to the time and cost of human labor, solutions that rely solely on crowd-sourcing are often limited to small datasets (i.e., a few thousand items). This paper proposes algorithms for integrating machine learning into crowd-sourced databases in order to combine the accuracy of human labeling with the speed and costeffectiveness of machine learning classifiers. 1.3y using active learning as our optimization strategy for labeling tasks in crowd-sourced databases, we can minimize the number of questions asked to the crowd, allowing crowd-sourced applications to scale (i.e., label much larger datasets at lower costs). Designing active learning algorithms for a crowd-sourced database poses manypractical challenges: such algorithms need to be generic, scalable, and easy to use, even for practitioners who are not machine learning experts. We draw on the theory of nonparametric bootstrap to design, to the best of our knowledge, the first active learning algorithms that meet all these requirements. Our results, on 3 real-world datasets collected with Amazons Mechanical Turk, and on 15 UCI datasets, show that our methods on average ask 1-2 orders of magnitude fewer questions than the baseline, and 4.5-44x fewer than existing active learning algorithms.
引用
收藏
页码:125 / 136
页数:12
相关论文
共 50 条
  • [1] Active Learning and Crowd-Sourcing for Machine Translation
    Ambati, Vamshi
    Vogel, Stephan
    Carbonell, Jaime
    LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 2169 - 2174
  • [2] Samromur: Crowd-sourcing large amounts of data
    Hedstrom, Staffan
    Mollberg, David Erik
    Thorhallsdottir, Ragnheiour
    Guonason, Jon
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2311 - 2316
  • [3] An Online Learning Approach to Improving the Quality of Crowd-Sourcing
    Liu, Yang
    Liu, Mingyan
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2017, 25 (04) : 2166 - 2179
  • [4] Emerging Technologies Webcams and Crowd-Sourcing to Identify Active Transportation
    Hipp, J. Aaron
    Adlakha, Deepti
    Eyler, Amy A.
    Chang, Bill
    Pless, Robert
    AMERICAN JOURNAL OF PREVENTIVE MEDICINE, 2013, 44 (01) : 96 - 97
  • [5] There's No Such Thing as the Perfect Map: Quantifying Bias in Spatial Crowd-sourcing Datasets
    Quattrone, Giovanni
    Capra, Licia
    De Meo, Pasquale
    PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON COMPUTER-SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING (CSCW'15), 2015, : 1021 - 1032
  • [6] Learning motion primitives and annotative texts from crowd-sourcing
    Takano W.
    ROBOMECH Journal, 2 (1):
  • [7] Pronunciation Learning for Named-Entities through Crowd-Sourcing
    Rutherford, Attapol T.
    Peng, Fuchun
    Beaufays, Francoise
    15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1448 - 1452
  • [8] Case studies on the exploitation of crowd-sourcing with Web 2.0 functionalities
    Wee Kim Wee School of Communication and Information, Nanyang Technological University, Singapore
    637718, Singapore
    Int. J. Electron. Bus., 4 (384-408):
  • [9] Enabling the Crowd Sourcing of Very Large Product Models
    Hardwick, Martin
    Loffredo, David
    Fritz, Joe
    Hedlind, Mikael
    DIGITAL PRODUCT AND PROCESS DEVELOPMENT SYSTEMS, 2013, 411 : 254 - 272
  • [10] Crowd-sourcing meets deep learning: A hybrid approach for retinal image annotation
    Roesch, Karin
    Leifman, George
    Swedish, Tristan
    Joshi, Dhruv
    Gupta, Vishal
    Chhablani, Jay
    Raskar, Ramesh
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2016, 57 (12)