MACE (Multi-Annotator Competence Estimation) is an implementation of an item-response model that let's you evaluate redundant annotations of categorical data. It provides competence estimates of the individual annotators and the most likely answer to each item.
If we have 10 annotators answer a question, and five answer with 'yes' and five with 'no' (a surprisingly frequent event), we would normaly have to flip a coin to decide what the right answer is. If we knew, however, that one of the people who answered 'yes' is an expert on the question, while one of the others just alwas selects 'no', we would take this information into account to weight their answers. MACE does exactly that. It tries to find out which annotators are more trustworthy and upweighs their answers. All you need to provide is a CSV file with one item per line.
In tests, MACE's trust estimates correlated highly wth the annotators' true competence, and it achieved accuracies of over 0.9 on several test sets. MACE can take annotated items into account, if they are available. This helps to guide the training and improves accuracy.