Teil eines Buches,

General Principles of User-Oriented Evaluation

M. King.
Evaluation of Text and Speech Systems, Kapitel 5, Dordrecht, (2007)
DOI: 10.1007/978-1-4020-5817-2_5

Zusammenfassung

This chapter is concerned with a particular perspective on the problem of evaluation design. User-oriented evaluation takes as primary some user or set of users who need to accomplish some task, and sets out to discover through evaluation whether a given software system will help them to do so effectively, productively, safely, and with a sense of satisfaction. (Note that, following ISO, user here is used in a very wide sense and encompasses much more than what has conventionally been called end-user.) There is a clear tension between taking specific user needs as primary and seeking common principles for the evaluation of particular software applications. The chapter suggests that this tension may be resolved by using an ISO standard for the evaluation of software as an appropriate level of generalization (ISO 9126). Quality models reflecting the characteristics of specific software applications (machine translation, document retrieval, information extraction systems, etc.) are then built on the skeleton set out in the ISO standard. Particular user needs are taken into account by picking out those parts of the appropriate quality model which reflect the needs, where necessary imposing a relative order of importance on the parts picked out. Execution of the evaluation then concentrates on the parts of the quality model chosen as pertinent to the user and the context of work. The focus of the chapter is on general design questions rather than on the strengths and weaknesses of specific metrics. However, there is some discussion of what it means for a metric to be valid and reliable, and of the difficulty of finding good metrics for those cases where system performance and human performance in interaction with the system are inextricably linked. A suggestion is made that it might be possible to automate an important part of the process of evaluation design, and an attempt to do this for the case of machine translation evaluations is briefly sketched.

BibTeX-Schlüssel: King07p125
Eintragstyp: incollection
Adresse: Dordrecht
Buchtitel: Evaluation of Text and Speech Systems
Jahr: 2007
Kapitel: 5
Seiten: 125-161
BibTeX-Querverweis: DybkjaerHemsenMinker2007
file: SpringerLink:2007/King07p125.pdf:PDF
groups: public
intrahash: f176222916e56a72c6b4bb18648ca3d6
DOI: 10.1007/978-1-4020-5817-2_5
timestamp: 2008.05.01
username: flint63

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Bitte melden Sie sich an um selbst Rezensionen oder Kommentare zu erstellen.

Zitieren Sie diese Publikation

%0 Book Section %1 King07p125 %A King, Margaret %B Evaluation of Text and Speech Systems %C Dordrecht %D 2007 %K v1205 springer paper language processing user interaction interface test %P 125-161 %R 10.1007/978-1-4020-5817-2_5 %T General Principles of User-Oriented Evaluation %X This chapter is concerned with a particular perspective on the problem of evaluation design. User-oriented evaluation takes as primary some user or set of users who need to accomplish some task, and sets out to discover through evaluation whether a given software system will help them to do so effectively, productively, safely, and with a sense of satisfaction. (Note that, following ISO, user here is used in a very wide sense and encompasses much more than what has conventionally been called end-user.) There is a clear tension between taking specific user needs as primary and seeking common principles for the evaluation of particular software applications. The chapter suggests that this tension may be resolved by using an ISO standard for the evaluation of software as an appropriate level of generalization (ISO 9126). Quality models reflecting the characteristics of specific software applications (machine translation, document retrieval, information extraction systems, etc.) are then built on the skeleton set out in the ISO standard. Particular user needs are taken into account by picking out those parts of the appropriate quality model which reflect the needs, where necessary imposing a relative order of importance on the parts picked out. Execution of the evaluation then concentrates on the parts of the quality model chosen as pertinent to the user and the context of work. The focus of the chapter is on general design questions rather than on the strengths and weaknesses of specific metrics. However, there is some discussion of what it means for a metric to be valid and reliable, and of the difficulty of finding good metrics for those cases where system performance and human performance in interaction with the system are inextricably linked. A suggestion is made that it might be possible to automate an important part of the process of evaluation design, and an attempt to do this for the case of machine translation evaluations is briefly sketched. %& 5

@incollection{King07p125, abstract = {This chapter is concerned with a particular perspective on the problem of evaluation design. User-oriented evaluation takes as primary some user or set of users who need to accomplish some task, and sets out to discover through evaluation whether a given software system will help them to do so effectively, productively, safely, and with a sense of satisfaction. (Note that, following ISO, user here is used in a very wide sense and encompasses much more than what has conventionally been called end-user.) There is a clear tension between taking specific user needs as primary and seeking common principles for the evaluation of particular software applications. The chapter suggests that this tension may be resolved by using an ISO standard for the evaluation of software as an appropriate level of generalization (ISO 9126). Quality models reflecting the characteristics of specific software applications (machine translation, document retrieval, information extraction systems, etc.) are then built on the skeleton set out in the ISO standard. Particular user needs are taken into account by picking out those parts of the appropriate quality model which reflect the needs, where necessary imposing a relative order of importance on the parts picked out. Execution of the evaluation then concentrates on the parts of the quality model chosen as pertinent to the user and the context of work. The focus of the chapter is on general design questions rather than on the strengths and weaknesses of specific metrics. However, there is some discussion of what it means for a metric to be valid and reliable, and of the difficulty of finding good metrics for those cases where system performance and human performance in interaction with the system are inextricably linked. A suggestion is made that it might be possible to automate an important part of the process of evaluation design, and an attempt to do this for the case of machine translation evaluations is briefly sketched.}, added-at = {2012-05-30T10:49:13.000+0200}, address = {Dordrecht}, author = {King, Margaret}, biburl = {https://www.bibsonomy.org/bibtex/2f176222916e56a72c6b4bb18648ca3d6/flint63}, booktitle = {Evaluation of Text and Speech Systems}, chapter = 5, crossref = {DybkjaerHemsenMinker2007}, doi = {10.1007/978-1-4020-5817-2_5}, file = {SpringerLink:2007/King07p125.pdf:PDF}, groups = {public}, interhash = {b3cb48779173bd5f52f619b8d63acd6f}, intrahash = {f176222916e56a72c6b4bb18648ca3d6}, keywords = {v1205 springer paper language processing user interaction interface test}, pages = {125-161}, timestamp = {2018-04-16T12:06:04.000+0200}, title = {General Principles of User-Oriented Evaluation}, username = {flint63}, year = 2007 }

BibSonomy

General Principles of User-Oriented Evaluation

Zusammenfassung

Tags

Nutzer

Kommentare und Rezensionenanzeigen / verbergen

Zitieren Sie diese Publikation

Mehr Zitationsstile

Suchen auf