|A standard-based search system.|
The field of online bibliographic search has an international standard associated with it, know as Z39.50, Information Retrieval (Z39.50-1995): Application Service Definition and Protocol Specification.
The protocol is fairly complex, as it uses the OSI network model's ASN.1 syntax notation, and the BER encoding format. This is a binary, variable-length format that can encode hierarchical information. Only systems that need to inter-operate with other Z39.50 systems must use this format.
The model, however, is of more general utility. Many systems can use this model, even if they have no intention of using the rest of Z39.50. This paper will walk through key features of the model.
At the simplest level, we'll talk about five key objects:
The Database is the central object. It has some number of DatabaseRecords, the documents it knows about.
The Query is the user's request for information. It is passed to the Database, which returns a ResultSet that knows the list of all matching documents. Each document in that list is a RetrievalRecord, which has a format the client understands.
This model has several advantages over a more naive approach:
- It specifies the database as the object responsible for mediating queries and results.
- It separates the idea of the RetrievalRecord from the DatabaseRecord. The format and structure of DatabaseRecords is a matter of concern only for the Database: these records need have no obvious relationship to the RetrievalRecords a client will see.
- It makes the ResultSet explicit, and makes it clearly responsible for knowing the matched items, by position and format.
The full model is an expansion of the previous model. The diagram shows the key classes in the same position, but surrounds them with others that support the rest of the model's richness.
The expanded information affects the Query and the Result, with some extra information around the database.
A Query can be formed of a number of AccessPointClauses. Each of these consists of a reference to an AccessPoint and a Value. You can think of an AccessPoint as a query field, and a Value as a value to match to it. For example, "Author = 'Smithson'" is a possible AccessPointClause. (Z39.50 supports a number of query forms.)
Note that AccessPoints are associated with the database. They can have any desired relation to the DatabaseRecords. The AccessPoints will typically include access to a database' fields (e.g., author, title), but it could also refer to data about the record (e.g., date entered). It might refer to something else entirely.
The right-hand side of the diagram shows several classes that pertain to the result. They essentially describe the transformation that converts a DatabaseRecord into a RetrievalRecord.
A DatabaseSchema is associated directly with the Database. It describes an abstract record structure understood by both client and server (in contrast to the structure of the DatabaseRecord, which the client doesn't care about). If a DatabaseSchema is applied to a DatabaseRecord, it produces an AbstractDatabaseRecord.
An ElementSpec defines the elements desired to be retrieved. In effect, it tells which fields of an AbstractDatabaseRecord are to be included. For example, a client may say "use the Brief ElementSpec", which returns only author, title, and year, rather than the full one, which returns all know information. (An ElementSpec transforms one AbstractDatabaseRecord into another AbstractDatabaseRecord.)
Finally, the RecordSyntax tells what format to use for the RetrievalRecord. This potentially lets clients choose whether to use plain text, Acrobat PDF, HTML, or other formats. In effect, it converts an AbstractDatabaseRecord to a RetrievalRecord.
The note on ResultSet specifies how a DatabaseRecord is transformed into a RetrievalRecord using the DatabaseSchema, ElementSpec, and RecordSyntax.
In some ways, this structure seems complex, but everything is there to do a job. The "extra" objects help identify which objects are known to client, server, or both. They provide flexibility on dimensions that have been known to change. They make explicit transformations that may not be immediately apparent.
The complexity of the Z39.50 information retrieval model should be seen as richness that enables this model to describe many retrieval systems.
- The Rose Model used to develop these pictures.
- The Library of Congress Z39.50 Maintenance Agency Page.