Chapter 2. Graph: From Class... Suppose you've been developing programs for a
while, and you keep finding yourself building graphs - the node and edge kind -
over and over. So, you've built a graph class, which you move from one project
to the next, improving it as you go. You've heard of frameworks, and you think your
graph might make a good mini-framework. A framework has a scope of application - it
can't be all things to everybody. What applications might use your graph
framework? • web site manager • traffic simulator • object modeling tool Web
Site Manager You'd like a tool to help maintain your web
site. You can treat the web as a graph: each page is a node, and links between
pages are edges. This tool might analyze your site: • Are any pages unreachable? Any have no exit? • Which pages have high in-counts? They may be
landmarks. • Which pages have high out-counts? They may be
index pages. • What's the readability of a page? • Is each page valid HTML? Traffic
Simulator You might need to simulate traffic in a city, to
decide where new streets should be placed. You can treat intersections as
nodes, and streets as edges; each edge might have a weight indicating its
capacity. (Or maybe it's better to treat intersections as edges and streets as
nodes.) You'd build a model of the street, and simulate
traffic. Perhaps you assign a set of probabilities for each intersection,
indicating where traffic might go. Then you run your simulation with various
changes. Object
Modeling Tool We want a tool to represent the object model of
our program, as a UML diagram. Classes will be nodes, and relationships between
classes will be edges. Both nodes and edges may have a lot of
information attached. Common
Characteristics The graphs in these applications share a number
of characteristics: • They have a moderate number of nodes
(hundreds, possibly thousands, but not millions). • There is information attached to nodes, edges,
or both. • The edges are directed (and the direction is
important). • There may be constraints on the graph. (For
example, in the modeling tool, a class may refer to itself, but may not inherit
from itself.) • The applications could benefit from a generic
graph. Approach ·
Original
package ·
Seek
the model ·
Be
ruthlessly consistent ·
Choose
names carefully ·
Keep
the secret ·
Document! ·
Define
what happens ·
Keep
independent features separate Our Existing Class
We want to grow our existing graph class into a
framework. In this chapter, we'll improve the class on its
own (in terms of good class design). In the next chapter, we'll mold it into
more of a framework. Along the way, we'll acquire a number of guidelines that
can improve classes or frameworks. Without further ado, here is our graph: // v0.1 public class Node { // How to use: subclass Node and add any // fields you need for your information. public String name; // The name of the node public Vector edges = new Vector(); public Node(String name) { this.name = name;}
public void addNode(Node n) { edges.addElement(n);}
public void removeEdge(Node n) { edges.removeElement(n);} public Node[] getNodes() { Node[] result = new Node[edges.size()]; for (int i = 0; i < result.length; i++) { result[i] = edges.elementAt(i); } return result; } public Enumeration pathTo (Node goal) { Vector path = new Vector(this); if (this == goal) return path.elements; Vector queue = new Vector(); queue.addElement(path);
while (queue.size() != 0) { // Breadth-first search Vector path = (Vector) queue.firstElement(); queue.removeElementAt(0); Node n = (Node)path.lastElement(); for (int i = 0; i < n.edges.size(); i++) { Node child = (Node)n.edges.elementAt(i); if (child == goal) { path.addElement(child); return path.elements(); } else if (!path.contains(child)) { Vector cloned = path.clone(); cloned.addElement(child); queue.addElement(cloned); } else { // child was in path already // loop - ignore path } } // end for } // end while return null; // all paths exhausted } This is not a perfect
class by any stretch. Perhaps it's not terrible, but it definitely has flaws.
How can we make it better? These are the rules
we’ll apply: [TBD]
For many frameworks, it
can be tricky or difficult to find the model behind the framework. For graphs,
we're lucky: graph theory is a well-known part of mathematics. From
http://www.astro.virginia.edu/~ewwlen/math:
***** "Graph. A
mathematical object composed of points known as Vertices or Nodes, and lines
connecting some (possibly empty) subset of them, known as Edges. Edge (Graph). For an
undirected Graph, an unordered pair of nodes which specify the line connecting
them. For a directed graph, the edge is an ordered pair of nodes. Digraph. A Graph in
which each Edge is replaced by a directed Edge." We can phrase this in a
more set-like way: G = <N, E> A
graph has nodes and edges N & E = 0 Nodes
and edges are separate things E <= N x N An
edge is an ordered pair of nodes We often visualize a
graph using bubbles and arrows:
We
spoke of our class as representing graphs, but when we compare it to the
mathematical definition, we can see that it represents the Node, while Graph
and Edge are implicit!
For explicit
representation: The graph as a whole is maintained by the clients of node;
centralizing it will let us track the graph as an explicit entity. For Edge -
we have applications that attach information to edges. If they're not explicit,
where can we attach the information? Against explicit
representation: The implicit representation saves space: we don't have to
create Edge objects when they have no separate attributes. Plus the class has
worked pretty well so far without them. Decision:
We will explicitly represent Graph and Edge.
Result: public class Graph { ...} public class Node { ...} public class Edge { ...} like this: // v0.2 public class Graph { public Enumeration getNodes(); public Enumeration getEdges(); } public class Node { public String name; private Vector edges; public Node (String name) { ...} public void addEdge(Edge e) { ...} // was addNode() public Edge[] getEdges() { ...} public Node[] getNodes() { ...} // still wanted? public void removeEdge(Edge e) { ...} public Enumeration pathTo(Node n) { ...} } public class Edge { public Edge(Node fromNode, Node toNode) { ...} public Node getFromNode() { ...} public Node getToNode() { ...} }
Keep an eye out for
places where you have been inconsistent. This can be small things, like using
the same abbreviation everywhere (not "Num" in one place and
"No" in another), or larger things, like consistent method or class
names. Consistency pays off in
the future: when you examine code, it works the same as other code you've
already seen. When you decide to merge classes, you'll have fewer style
clashes. In the graph class, the
most obvious inconsistency is that we return arrays some times, and
Enumerations others.
For
arrays: Arrays tell exactly what the type is - they don't require casts. (So,
"Edge[]" contains edges, enforced by the language.
"Enumeration" could be anything, enforced only by convention.) For
enumerations: The whole collection need not be generated all at once - access
is controlled rather than arbitrary. An enumeration can be generated from many
sorts of collections, while an array has a fixed form. If we return arrays, and
the internal format is not an array, it implies a copy must be made. Decision:
We will use Enumerations. (The argument about not forcing a particular internal
representation is the deciding factor.) It's a close call; in some
circumstances you might prefer arrays. *****
TBD - collections Thus, we'll make a
couple changes in Node: public Enumeration getEdges(); public Enumeration getNodes(); The other names and
types are pretty consistent; we'll live with them for now. Emerson: "A foolish
consistency is the hobgoblin of little minds."
One rule people often
use is to make class names be nouns or noun phrases (such as Container or JInternalFrame), methods be verbs
(e.g., paint()), and attributes be nouns (e.g., getIcon() or getText()). (See http://java.sun.com/docs/codeConv/html). For interfaces,
you'll often see either a noun phrase (such as TreeCellEditor), or a verb
+ "able" when it's an interface to mix in features (e.g., Linkable, Undoable). You want
concise, consistent names. When you have pairs, you want names that go
together, e.g., insertElementAt(), removeElementAt(). For our
Graph classes, the names of the Edge methods getFromNode() and getToNode() are weak.
It took a while to recall a better name pair: "source" and
"sink" or perhaps "source" and "target".
public Node getSource() { ...} public Node getSink() { ...} A dictionary
and a thesaurus can be very helpful in choosing just the right name.
A class
should encapsulate some secret. For example, the representation of an object
should be hidden, so clients depend on the interface alone. This way, a change
in the implementation requires no changes in the clients. What about
our class? Can we tell how it's represented?
Using attributes couples clients to a decision subject to change.
The day will come when you change your mind: you want to count accesses, store
the attribute somewhere else, etc. If everybody who uses the class accesses it
directly, you have no freedom to change the attributes.
Moving to getter and setter methods, as in the previous rule, are
often an improvement but still not enough. Try to go even further - what’s the
use of the attribute going to do with the information? Can you do it for them?
If all you’re doing is making a C-style “struct”, the object is not very
object-oriented.
The name attribute is currently declared public. We'll hide it
behind methods: public void setName (String name) { ...} public String getName () { ...} Can we tell
how the graph is implemented? Much of the
representation is hidden. We can't tell whether the node is an array,
vector, or some other structure, or if the edge is stored with the graph
object, the node, or somewhere else entirely. There is a
place where we can hide representations better, though. Recall how we introduced explicit Edge
objects in order to reflect the full model. A node must be able to report its
edges, which implies Edge objects must exist somewhere. But must
they be permanent? We can introduce a test for equality: public boolean equals (Edge e) { ...} This way,
when a node reports an edge object, it has the freedom to create one on the
spot (rather than requiring it to have "independent" existence). We're not forbidding
edges to exist. Most implementations might well create explit edges; they would
just define equals() to be "==". By not
requiring edges, we allow for a natural case: when edges have no data of their
own, and most traversal is from node to node, our graph can use the original
representation of storing a Vector of Nodes rather than Edges. What about
node equality? A similar argument applies.
* By Java conventions, we also redefine hashCode(), to ensure
that two equal components have equal hash codes. Finally, how
are Graphs and Nodes created? We've defined a constructor for Node, but how
does a graph find out which nodes it contains?
One way to
handle this problem would be to add a method: public void add (Node n) { ...} // but... Then we
could create a node, and add it to the graph: Graph g = new Graph(); Node n = new Node("test"); g.add(n); // but... We let Graph
track Nodes, and Nodes track Edges. (But wait: Graph can tell us what the Edges
are - it must go to the Nodes to get them. Is that OK?) There is an
alternative, known as a Factory Method. (We'll discuss this more in the chapter
on patterns. TBD) A factory method is a
method that creates another object. The constructor of the other object is
usually made unavailable after that. In class
Graph, add these methods: public void makeNode (String name) { ...} public void makeEdge (Node n1, Node n2) { ...} Hide the
Node constructor, and move responsibility for Edges into the Graph. Graph now
owns responsibility for creating nodes and edges. Since Graph creates node and
edges, we'll make it responsible for deleting them as well.
What's our
code now? public class Graph { public Graph() { ...} public Node makeNode (String name) { ...} public Edge makeEdge (Node n1, Node n2) { ...} public Enumeration getNodes() { ...} public Enumeration getEdges() { ...} public void remove (Node n) { ...} public void remove (Edge e) { ...} } public class Node { public String getName() { ...} public void setName (String name) { ...} public Enumeration getEdges() { ...} public Enumeration getNodes() { ...} public Enumeration pathTo (Node n) { ...} public boolean equals (Object n) { ...} public int hashCode() { ...} } public class Edge { public Node getSource() { ...} public Node getSink() { ...} public boolean equals(Object e) { ...} public int hashCode() { ...} } [TBD -
summarize]
As you move
your framework from being a private entity to becoming a public one, you must
improve the documentation so other people can use it. (For one thing - they're
not experts in your subject - that's why they want a framework in the first
place!) In The Mythical Man-Month, Brooks says that a public documented
product will take about 9 times the effort to produce. Where do we
stand? Well, the
2-line comment: "How to use: subclass Node, and add any fields you need
for your information" is not universally regarded as adequate
documentation for a framework. What is
needed? I'd suggest at least three things: javadoc comments in the code (so a
reference manual can be automatically generated), a framework user's guide, and
a test gauge. The javadoc should explain each public or protected class and
method. The user's guide should contain an overview, to explain the overall
philosophy, and a set of task-oriented examples. The test gauge helps
demonstrate that the framework is working. For the
graph class, a minimal user guide would include: Overview: What is a Graph? How to create a graph How to make nodes contain
special information How to make edges contain
special information How to define a graph algorithm Index Provide
concrete source code for complete, runnable applications. This lets people move
from a working example to the application they want to build. You'll rarely
hear a complaint "Too many examples."
Bertrand
Meyer, the inventor of Eiffel, defines contracts in terms of pre-conditions and
post-conditions: Pre-condition: If you do this... Post-condition: This will
happen. Under this
regimen, it's the caller's responsibility to ensure that preconditions are met,
and the callee's responsibility to fulfill the contract. (Meyer also defines a
class invariant that will be true for the class at (almost) all times.) ***** Expand
to two separate topics. You may not
want to develop these contracts completely, but users need to know what is
legal for each argument. For each return value, you need to tell what values
are possible.
There's
nothing wrong with restricting values - not documenting the restrictions is a
problem. For example,
in our code we do not want to allow edges to have one of their nodes be null.
(We never expect getSource() or getSink() to return
null either.) This is a reasonable restriction. If you don't
tell people what's legal, they only have one way to find out: try it and see
what happens. That's no good though, because it only tells them about today. If
you haven't said what's right, they can't trust it tomorrow.
If one
feature can change independently of another, don't provide an interface that
couples them. The graph
classes are pretty good on this front (most methods have zero or one argument).
So, we'll make up an example. Suppose you create a document class and have a
method: public void setStyle (Font f, int mods, int size) [TBD - Regard a style as
being…] Can we conceive of
changing the size without changing the font? Of course. So we end up with
client code like this: setStyle (getFont(), getModifiers(), newSize); We've had to fetch the
old font and modifiers just to pass them in again - wasted work! It can be even worse.
Suppose we don't have getFont() or getModifiers() methods; instead we
have a getStyle() method that returns them. Then we have to call getStyle(), save the result, pull
the pieces out, and pass them to setStyle(). (Odds are good that
would require another class to hold the style information.) Or, worst of all,
suppose that the class doesn't provide any method to access the current style.
Then you have to track it through variables external to the class. The "compound"
setStyle() method is really only
acceptable as a convenience method: if all three attributes are often set
together, and the individual "set" methods are available, then we can
define public void setStyle (Font f, int modifiers, int size) { setFont(f); setModifiers(modifiers); setSize(size); } You usually become aware
of the need for these convenience methods after you've been using the class for
a while, and usage patterns recur. Be suspicious of methods
with a lot of parameters. Try to keep independent features separate. ***** Decision - what
can change, what won't? | ||||||||||||||