Fleiss' kappa is a statistical measure for assessing the reliability of agreement between a fixed number of raters when assigning categorical ratings to a number of items or classifying items. This contrasts with other kappas such as Cohen's kappa, which only work when assessing the agreement between two raters. The measure calculates the degree of agreement in classification over that which would be expected by chance and is scored as a number between 0 and 1. There is no generally agreed on measure of significance, although guidelines have been given.
J. Kamps, S. Geva, A. Trotman, A. Woodley, and M. Koolen. Advances in Focused Retrieval: 7th International Workshop of the Initiative for the Evaluation of XML Retrieval (INEX 2008), volume 5631 of LNCS, page 1--28. Springer Verlag, Berlin, Heidelberg, (2009)