@aho

The IIR evaluation model: a framework for evaluation of interactive information retrieval systems

. Information Research, (2003)

Abstract

An alternative approach to evaluation of interactive information retrieval (IIR) systems, referred to as the IIR evaluation model, is proposed. The model provides a framework for the collection and analysis of IR interaction data. The aim of the model is two-fold: 1) to facilitate the evaluation of IIR systems as realistically as possible with reference to actual information searching and retrieval processes, though still in a relatively controlled evaluation environment; and 2) to calculate the IIR system performance taking into account the non-binary nature of the assigned relevance assessments. The IIR evaluation model is presented as an alternative to the system-driven Cranfield model (Cleverdon, Mills & Keen, 1966; Cleverdon & Keen, 1966) which still is the dominant approach to the evaluation of IR and IIR systems. Key elements of the IIR evaluation model are the use of realistic scenarios, known as simulated work task situations, and the (call for) alternative performance measures. A simulated work task situation, which is a short 'cover story', serves two main functions: 1) it triggers and develops a simulated information need by allowing for user interpretations of the situation, leading to cognitively individual information need interpretations as in real life; and 2) it is the platform against which situational relevance is judged. Further, by being the same for all test persons experimental control is provided. Hence, the concept of a simulated work task situation ensures the experiment both realism and control. Guidelines and recommendations for the application of simulated work task situations are provided. Examples of alternative performance measures are: relative relevance (RR), ranked half-life (RHL) (Borlund & Ingwersen, 1998), cumulated gain (CG) and cumulated gain with discount (DCG) (Järvelin & Kekäläinen, 2000). These measures can incorporate non-binary relevance assessments, necessary due to the result of realistic interaction and relevance assessment behaviour of users in the process of searching and assessing relevance of retrieved information objects.

Links and resources

Tags

community

  • @brusilovsky
  • @aho
  • @zalan.kramer
  • @dblp
@aho's tags highlighted