Abstract
Many real world problems can now be effectively solved using supervised
machine learning. A major roadblock is often the lack of an adequate quantity
of labeled data for training. A possible solution is to assign the task of
labeling data to a crowd, and then infer the true label using aggregation
methods. A well-known approach for aggregation is the Dawid-Skene (DS)
algorithm, which is based on the principle of Expectation-Maximization (EM). We
propose a new simple, yet effective, EM-based algorithm, which can be
interpreted as a `hard' version of DS, that allows much faster convergence
while maintaining similar accuracy in aggregation. We show the use of this
algorithm as a quick and effective technique for online, real-time sentiment
annotation. We also prove that our algorithm converges to the estimated labels
at a linear rate. Our experiments on standard datasets show a significant
speedup in time taken for aggregation - upto $\sim$8x over Dawid-Skene and
$\sim$6x over other fast EM methods, at competitive accuracy performance. The
code for the implementation of the algorithms can be found at
https://github.com/GoodDeeds/Fast-Dawid-Skene
Users
Please
log in to take part in the discussion (add own reviews or comments).