Dissertation - Hypothesis

9-3-95

Model

Evaluation of information retrieval systems has often been treated as "black-box" evaluation: "How well does the system match results to the query?" While such a question is important, it is not complete.

As we move toward considering more of the human interface design of a system, we have other questions as well: "Does the system encourage the user to ask the right questions? Can the user understand the results? Does the system lead the user down a good path?"

Information retrieval is a process, and real systems must be evaluated in that context. Two systems are likely to diverge in real use, as the user sees different results and asks different questions.

Rather than Input-->Output, as a black box, we also consider internal effects and external effects: Input-->(Internal Effects)-->Output | | \ / V External Effects That is, we want to know what's happening inside the black box, as well as any side effects. Since this is a process, the steps involved have a history, and may affect future steps.

Two systems might have the same measurable output for direct performance of a task. For example, users of two library systems might look for something, and arrive at lists of items that an expert evaluator decides are equally good results for the search undertaken. However, by looking inside the process by which the systems are used, or looking at effects other than "direct quality of retrieval," we may discover reasons to prefer one system over the other.

Hypotheses

Suppose the "Input" is a set of tasks with some medium- to long-term commitment (not just one-shot queries intended to be tested and thrown away).

Output

These measures are the traditional ones that evaluate how well a user performs a given task.

Task Performance

O1. There are important tasks that yield higher-quality results using a powerful browser.
  • A large class of searches require looking at what's there to decide what to do next.
  • but, there may be useful tasks that don't need the interaction.
O2. There are important tasks that yield results faster using a powerful browser.
O3. Users of a browser give better estimates of the number of relevant items.
  • They have been exposed to items along the way, and may have a sense of what proportion are useful.
  • But this exposure may not be helpful - the users may get just as good a sense from a query result.
O4. Users of a browser can better estimate the distribution of items' attributes.
  • Browsing exposes users to the attributes much more often.

User Preference

O5. There are important tasks that some people prefer to undertake using a powerful browser.
  • As for O1.

Internal Effects

These measures assess what happens to the user during performance of a task. While we expect that "better" internal effects will lead to better output, it's not a contradiction if this is not the case. (For example, one system might allow extremely fast query formulation, but still have worse output because of other factors. In another context, the fastest guitarist might not be the best guitarist.)

Task Performance

IE1. Seeing items in contex helps people elaborate their information needs more quickly.
  • Seeing what is there may help guide them to the useful parts.
IE2. The SortTables browser can perform quickly enough on moderately large collections (100K-1M items).
IE3. Use of the SortTables browser will lead to fewer false trails than when using the xxx or yyy systems.
IE4. SortTables users will tend not to give up as quickly on tasks, compared to use of systems xxx or yyy.
IE5. Users can better demonstrate a sense of range boundaries and attribute values using the SortTables browser.
IE6. Of all items shown, a higher percentage of potentially relevant items are shown when using the SortTables browser.

Preference

IE7. Some users prefer interactive to "batch" processing.

Perception

IE8. The SortTables browser provides a steady sense of satisfaction and progress.
  • Its interaction steps are of similar power.
  • But this sense of satisfaction/progress may not translate into better searches.
IE9. The SortTables browser provides a better sense of closure when the task is complete.

External Effects

These are the side effects from using the system. They do not show up directly in the output, but may show up in learning or future performance. This potential future payoff may not show up in the quality of the output. EE1. People learn more by browsing.
EE2. People learn more about the search process when browsing, making future searching faster.
EE3. People using SortTables better retain learned information across sessions.
EE4. People using SortTables better retain their ability to use the system across sessions.
EE5. People using a browser learn more about their subject when given a broad problem.
  • The incidental exposure to related information may pay off in future exploration of a related problem.

Copyright 1994-2006, William C. Wake - William.Wake@acm.org