Abstract
Supervised learning depends on annotated examples, which are taken to be the
ground truth. But these labels often come from noisy crowdsourcing
platforms, like Amazon Mechanical Turk. Practitioners typically collect
multiple labels per example and aggregate the results to mitigate noise (the
classic crowdsourcing problem). Given a fixed annotation budget and unlimited
unlabeled data, redundant annotation comes at the expense of fewer labeled
examples. This raises two fundamental questions: (1) How can we best learn from
noisy workers? (2) How should we allocate our labeling budget to maximize the
performance of a classifier? We propose a new algorithm for jointly modeling
labels and worker quality from noisy crowd-sourced data. The alternating
minimization proceeds in rounds, estimating worker quality from disagreement
with the current model and then updating the model by optimizing a loss
function that accounts for the current estimate of worker quality. Unlike
previous approaches, even with only one annotation per example, our algorithm
can estimate worker quality. We establish a generalization error bound for
models learned with our algorithm and establish theoretically that it's better
to label many examples once (vs less multiply) when worker quality is above a
threshold. Experiments conducted on both ImageNet (with simulated noisy
workers) and MS-COCO (using the real crowdsourced labels) confirm our
algorithm's benefits.
Users
Please
log in to take part in the discussion (add own reviews or comments).