@diego_ma

Non-Expert Evaluation of Summarization Systems is Risky

, and . Proceedings NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, page 148-151. (2010)

Abstract

We provide evidence that intrinsic evaluation of summaries using Amazon’s Mechanical Turk is quite difficult. Experiments mirroring evaluation at the Text Analysis Conference’s summarization track show that nonexpert judges are not able to recover system rankings derived from experts.

Links and resources

Tags

community

  • @diego_ma
  • @dblp
@diego_ma's tags highlighted