Dissertation - Hypothesis9-3-95ModelEvaluation of information retrieval systems has often been treated as "black-box" evaluation: "How well does the system match results to the query?" While such a question is important, it is not complete.As we move toward considering more of the human interface design of a system, we have other questions as well: "Does the system encourage the user to ask the right questions? Can the user understand the results? Does the system lead the user down a good path?" Information retrieval is a process, and real systems must be evaluated in that context. Two systems are likely to diverge in real use, as the user sees different results and asks different questions. Rather than Input-->Output, as a black box, we also consider internal effects and external effects: Input-->(Internal Effects)-->Output | | \ / V External Effects That is, we want to know what's happening inside the black box, as well as any side effects. Since this is a process, the steps involved have a history, and may affect future steps. Two systems might have the same measurable output for direct performance of a task. For example, users of two library systems might look for something, and arrive at lists of items that an expert evaluator decides are equally good results for the search undertaken. However, by looking inside the process by which the systems are used, or looking at effects other than "direct quality of retrieval," we may discover reasons to prefer one system over the other. HypothesesSuppose the "Input" is a set of tasks with some medium- to long-term commitment (not just one-shot queries intended to be tested and thrown away).OutputThese measures are the traditional ones that evaluate how well a user performs a given task.Task PerformanceO1. There are important tasks that yield higher-quality results using a powerful browser.
User PreferenceO5. There are important tasks that some people prefer to undertake using a powerful browser.
Internal EffectsThese measures assess what happens to the user during performance of a task. While we expect that "better" internal effects will lead to better output, it's not a contradiction if this is not the case. (For example, one system might allow extremely fast query formulation, but still have worse output because of other factors. In another context, the fastest guitarist might not be the best guitarist.)Task PerformanceIE1. Seeing items in contex helps people elaborate their information needs more quickly.
PreferenceIE7. Some users prefer interactive to "batch" processing.PerceptionIE8. The SortTables browser provides a steady sense of satisfaction and progress.
External EffectsThese are the side effects from using the system. They do not show up directly in the output, but may show up in learning or future performance. This potential future payoff may not show up in the quality of the output. EE1. People learn more by browsing.
|
|
Copyright 1994-2006, William C. Wake - William.Wake@acm.org |