Scaling Up Crowd-Sourcing to Very Large Datasets: A Case for Active Learning

被引：96

作者：

Mozafari, Barzan ^{[1
]}

Sarkar, Purna ^{[2
]}

Franklin, Michael ^{[1
,2
]}

Jordan, Michael ^{[1
]}

Madden, Samuel ^{[3
]}

机构：

[1] Univ Michigan, Ann Arbor, MI 48109 USA

[2] Univ Calif Berkeley, Berkeley, CA 94720 USA

[3] MIT, Cambridge, MA 02139 USA

来源：

PROCEEDINGS OF THE VLDB ENDOWMENT | 2014年 / 8卷 / 02期

关键词：

D O I：

10.14778/2735471.2735474

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Crowd-sourcing has become a popular means of acquiring labeled data for many tasks where humans are more accurate than computers, such as image tagging, entity resolution, and sentiment analysis. I lowever, due to the time and cost of human labor, solutions that rely solely on crowd-sourcing are often limited to small datasets (i.e., a few thousand items). This paper proposes algorithms for integrating machine learning into crowd-sourced databases in order to combine the accuracy of human labeling with the speed and costeffectiveness of machine learning classifiers. 1.3y using active learning as our optimization strategy for labeling tasks in crowd-sourced databases, we can minimize the number of questions asked to the crowd, allowing crowd-sourced applications to scale (i.e., label much larger datasets at lower costs). Designing active learning algorithms for a crowd-sourced database poses manypractical challenges: such algorithms need to be generic, scalable, and easy to use, even for practitioners who are not machine learning experts. We draw on the theory of nonparametric bootstrap to design, to the best of our knowledge, the first active learning algorithms that meet all these requirements. Our results, on 3 real-world datasets collected with Amazons Mechanical Turk, and on 15 UCI datasets, show that our methods on average ask 1-2 orders of magnitude fewer questions than the baseline, and 4.5-44x fewer than existing active learning algorithms.

引用

页码：125 / 136

页数：12

共 50 条

[1] Active Learning and Crowd-Sourcing for Machine Translation
Ambati, Vamshi
Vogel, Stephan
Carbonell, Jaime
LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2010, : 2169 - 2174
[2] Samromur: Crowd-sourcing large amounts of data
Hedstrom, Staffan
Mollberg, David Erik
Thorhallsdottir, Ragnheiour
Guonason, Jon
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 2311 - 2316
[3] An Online Learning Approach to Improving the Quality of Crowd-Sourcing
Liu, Yang
Liu, Mingyan
IEEE-ACM TRANSACTIONS ON NETWORKING, 2017, 25 (04) : 2166 - 2179
[4] Emerging Technologies Webcams and Crowd-Sourcing to Identify Active Transportation
Hipp, J. Aaron
Adlakha, Deepti
Eyler, Amy A.
Chang, Bill
Pless, Robert
AMERICAN JOURNAL OF PREVENTIVE MEDICINE, 2013, 44 (01) : 96 - 97
[5] There's No Such Thing as the Perfect Map: Quantifying Bias in Spatial Crowd-sourcing Datasets
Quattrone, Giovanni
Capra, Licia
De Meo, Pasquale
PROCEEDINGS OF THE 2015 ACM INTERNATIONAL CONFERENCE ON COMPUTER-SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING (CSCW'15), 2015, : 1021 - 1032
[6] Learning motion primitives and annotative texts from crowd-sourcing
Takano W.
ROBOMECH Journal, 2 (1):
[7] Pronunciation Learning for Named-Entities through Crowd-Sourcing
Rutherford, Attapol T.
Peng, Fuchun
Beaufays, Francoise
15TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2014), VOLS 1-4, 2014, : 1448 - 1452
[8] Case studies on the exploitation of crowd-sourcing with Web 2.0 functionalities
Wee Kim Wee School of Communication and Information, Nanyang Technological University, Singapore
637718, Singapore
Int. J. Electron. Bus., 4 (384-408):
[9] Enabling the Crowd Sourcing of Very Large Product Models
Hardwick, Martin
Loffredo, David
Fritz, Joe
Hedlind, Mikael
DIGITAL PRODUCT AND PROCESS DEVELOPMENT SYSTEMS, 2013, 411 : 254 - 272
[10] Crowd-sourcing meets deep learning: A hybrid approach for retinal image annotation
Roesch, Karin
Leifman, George
Swedish, Tristan
Joshi, Dhruv
Gupta, Vishal
Chhablani, Jay
Raskar, Ramesh
INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 2016, 57 (12)

← 1 2 3 4 5 →