Chapter 2. Graph: From Class...

 

Suppose you've been developing programs for a while, and you keep finding yourself building graphs - the node and edge kind - over and over. So, you've built a graph class, which you move from one project to the next, improving it as you go.

 

You've heard of frameworks, and you think your graph might make a good mini-framework.

 

A framework has a scope of application - it can't be all things to everybody. What applications might use your graph framework?

• web site manager

• traffic simulator

• object modeling tool

 

 

Web Site Manager

You'd like a tool to help maintain your web site. You can treat the web as a graph: each page is a node, and links between pages are edges.

 

This tool might analyze your site:

• Are any pages unreachable? Any have no exit?

• Which pages have high in-counts? They may be landmarks.

• Which pages have high out-counts? They may be index pages.

• What's the readability of a page?

• Is each page valid HTML?

 

 

Traffic Simulator

You might need to simulate traffic in a city, to decide where new streets should be placed. You can treat intersections as nodes, and streets as edges; each edge might have a weight indicating its capacity. (Or maybe it's better to treat intersections as edges and streets as nodes.)

 

You'd build a model of the street, and simulate traffic. Perhaps you assign a set of probabilities for each intersection, indicating where traffic might go. Then you run your simulation with various changes.

 

 

Object Modeling Tool

We want a tool to represent the object model of our program, as a UML diagram. Classes will be nodes, and relationships between classes will be edges.

 

Both nodes and edges may have a lot of information attached.

 

 

Common Characteristics

The graphs in these applications share a number of characteristics:

• They have a moderate number of nodes (hundreds, possibly thousands, but not millions).

• There is information attached to nodes, edges, or both.

• The edges are directed (and the direction is important).

• There may be constraints on the graph. (For example, in the modeling tool, a class may refer to itself, but may not inherit from itself.)

• The applications could benefit from a generic graph.

 

 

Approach

·        Original package

·        Seek the model

·        Be ruthlessly consistent

·        Choose names carefully

·        Keep the secret

·        Document!

·        Define what happens

·        Keep independent features separate

 

 

 

Our Existing Class

 

We want to grow our existing graph class into a framework.

 

In this chapter, we'll improve the class on its own (in terms of good class design). In the next chapter, we'll mold it into more of a framework. Along the way, we'll acquire a number of guidelines that can improve classes or frameworks.

 

Without further ado, here is our graph:

 

// v0.1

public class Node {

      // How to use: subclass Node and add any

      // fields you need for your information.

 

      public String name;     // The name of the node

 

      public Vector edges = new Vector();

 

      public Node(String name) { this.name = name;}

     

      public void addNode(Node n) { edges.addElement(n);}

     

      public void removeEdge(Node n) { edges.removeElement(n);}

 

      public Node[] getNodes() {

            Node[] result = new Node[edges.size()];

            for (int i = 0; i < result.length; i++) {

                  result[i] = edges.elementAt(i);

            }

            return result;

      }

 

      public Enumeration pathTo (Node goal) {

            Vector path = new Vector(this);

            if (this == goal) return path.elements;

 

            Vector queue = new Vector();

            queue.addElement(path);

           

            while (queue.size() != 0) {     // Breadth-first search

                  Vector path = (Vector) queue.firstElement();

                  queue.removeElementAt(0);

                  Node n = (Node)path.lastElement();

 

                  for (int i = 0; i < n.edges.size(); i++) {

                        Node child = (Node)n.edges.elementAt(i);

                        if (child == goal) {

                              path.addElement(child);

                              return path.elements();

                        }  else if (!path.contains(child)) {

                              Vector cloned = path.clone();

                              cloned.addElement(child);

                              queue.addElement(cloned);

                        }  else {     // child was in path already

                              // loop - ignore path

                        }

 

                  }     // end for

            }   // end while

 

            return null;      // all paths exhausted

      }

 

 

This is not a perfect class by any stretch. Perhaps it's not terrible, but it definitely has flaws. How can we make it better?

 

These are the rules we’ll apply:

[TBD]

 

 

 

Seek the model.

 

For many frameworks, it can be tricky or difficult to find the model behind the framework. For graphs, we're lucky: graph theory is a well-known part of mathematics.

 

From http://www.astro.virginia.edu/~ewwlen/math:  *****

"Graph. A mathematical object composed of points known as Vertices or Nodes, and lines connecting some (possibly empty) subset of them, known as Edges.

 

Edge (Graph). For an undirected Graph, an unordered pair of nodes which specify the line connecting them. For a directed graph, the edge is an ordered pair of nodes.

 

Digraph. A Graph in which each Edge is replaced by a directed Edge."

 

We can phrase this in a more set-like way:

                G = <N, E>                     A graph has nodes and edges

                N & E = 0                       Nodes and edges are separate things

                E <= N x N                     An edge is an ordered pair of nodes

 

We often visualize a graph using bubbles and arrows:

 

 

                               

 

 Where's the graph?!

 

We spoke of our class as representing graphs, but when we compare it to the mathematical definition, we can see that it represents the Node, while Graph and Edge are implicit!

 

 

 Should we represent Graph and Edge explicitly?

For explicit representation: The graph as a whole is maintained by the clients of node; centralizing it will let us track the graph as an explicit entity. For Edge - we have applications that attach information to edges. If they're not explicit, where can we attach the information?

 

Against explicit representation: The implicit representation saves space: we don't have to create Edge objects when they have no separate attributes. Plus the class has worked pretty well so far without them.

 

Decision: We will explicitly represent Graph and Edge.

 

 We'll keep the cost of an explicit edge in mind though.

 

 

 

Result:

      public class Graph { ...}

      public class Node { ...}

      public class Edge { ...}

 

like this:

 

      // v0.2

public class Graph {

      public Enumeration getNodes();

      public Enumeration getEdges();

}

 

public class Node {

      public String name;

      private Vector edges;

      public Node (String name) { ...}

      public void addEdge(Edge e) { ...}        // was addNode()

      public Edge[] getEdges() { ...}

      public Node[] getNodes() { ...}           // still wanted?

      public void removeEdge(Edge e) { ...}

      public Enumeration pathTo(Node n) { ...}

}

 

public class Edge {

      public Edge(Node fromNode, Node toNode) { ...}

      public Node getFromNode() { ...}

      public Node getToNode() { ...}

}

 

 


 Be ruthlessly consistent.

 

Keep an eye out for places where you have been inconsistent. This can be small things, like using the same abbreviation everywhere (not "Num" in one place and "No" in another), or larger things, like consistent method or class names.

 

Consistency pays off in the future: when you examine code, it works the same as other code you've already seen. When you decide to merge classes, you'll have fewer style clashes.

 

In the graph class, the most obvious inconsistency is that we return arrays some times, and Enumerations others.

 

 Arrays or Enumerations?

 

For arrays: Arrays tell exactly what the type is - they don't require casts. (So, "Edge[]" contains edges, enforced by the language. "Enumeration" could be anything, enforced only by convention.)

 

For enumerations: The whole collection need not be generated all at once - access is controlled rather than arbitrary. An enumeration can be generated from many sorts of collections, while an array has a fixed form. If we return arrays, and the internal format is not an array, it implies a copy must be made.

 

Decision: We will use Enumerations. (The argument about not forcing a particular internal representation is the deciding factor.) It's a close call; in some circumstances you might prefer arrays.

 

***** TBD - collections

 

 

Thus, we'll make a couple changes in Node:

      public Enumeration getEdges();

      public Enumeration getNodes();

 

The other names and types are pretty consistent; we'll live with them for now.

 

Emerson: "A foolish consistency is the hobgoblin of little minds."


 

 Choose names carefully.

 

One rule people often use is to make class names be nouns or noun phrases (such as Container or JInternalFrame), methods be verbs (e.g., paint()), and attributes be nouns (e.g., getIcon() or getText()). (See http://java.sun.com/docs/codeConv/html).

 

For interfaces, you'll often see either a noun phrase (such as TreeCellEditor), or a verb + "able" when it's an interface to mix in features (e.g., Linkable, Undoable).

 

You want concise, consistent names. When you have pairs, you want names that go together, e.g., insertElementAt(), removeElementAt().

 

For our Graph classes, the names of the Edge methods getFromNode() and getToNode() are weak. It took a while to recall a better name pair: "source" and "sink" or perhaps "source" and "target".

 

 

Rename the methods to use "source" and "sink."

 

public Node getSource() { ...}

public Node getSink() { ...}

 

A dictionary and a thesaurus can be very helpful in choosing just the right name.

 


 The Magician's Rule: Keep the secret.

 

A class should encapsulate some secret. For example, the representation of an object should be hidden, so clients depend on the interface alone. This way, a change in the implementation requires no changes in the clients.

 

What about our class? Can we tell how it's represented?

 

 

 Don't expose attributes in public.

 

Using attributes couples clients to a decision subject to change. The day will come when you change your mind: you want to count accesses, store the attribute somewhere else, etc. If everybody who uses the class accesses it directly, you have no freedom to change the attributes.

 

 

 Don't build “data bags.”

 

Moving to getter and setter methods, as in the previous rule, are often an improvement but still not enough. Try to go even further - what’s the use of the attribute going to do with the information? Can you do it for them? If all you’re doing is making a C-style “struct”, the object is not very object-oriented.

 

 Wrap the name attribute.

 

The name attribute is currently declared public. We'll hide it behind methods:

      public void setName (String name) { ...}

      public String getName () { ...}

 

 

Can we tell how the graph is implemented?

 

Much of the representation is hidden. We can't tell whether the node is an array, vector, or some other structure, or if the edge is stored with the graph object, the node, or somewhere else entirely.

 

There is a place where we can hide representations better, though.  Recall how we introduced explicit Edge objects in order to reflect the full model. A node must be able to report its edges, which implies Edge objects must exist somewhere. 

 

But must they be permanent? We can introduce a test for equality:

      public boolean equals (Edge e) { ...}

This way, when a node reports an edge object, it has the freedom to create one on the spot (rather than requiring it to have "independent" existence).

 

We're not forbidding edges to exist. Most implementations might well create explit edges; they would just define equals() to be "==". By not requiring edges, we allow for a natural case: when edges have no data of their own, and most traversal is from node to node, our graph can use the original representation of storing a Vector of Nodes rather than Edges.

 

What about node equality? A similar argument applies.

 

 Test nodes and edges for equality using equals(). *

 

* By Java conventions, we also redefine hashCode(), to ensure that two equal components have equal hash codes.

 

Finally, how are Graphs and Nodes created? We've defined a constructor for Node, but how does a graph find out which nodes it contains?

 

 

 Our Graph has accessor methods, but no modifying methods - it has no way to be told what's in the graph!

 

 

One way to handle this problem would be to add a method:

      public void add (Node n) { ...}     // but...

Then we could create a node, and add it to the graph:

      Graph g = new Graph();

      Node n = new Node("test");

      g.add(n);         // but...

 

We let Graph track Nodes, and Nodes track Edges. (But wait: Graph can tell us what the Edges are - it must go to the Nodes to get them. Is that OK?)

 

There is an alternative, known as a Factory Method. (We'll discuss this more in the chapter on patterns.  TBD) A factory method is a method that creates another object. The constructor of the other object is usually made unavailable after that.

 

In class Graph, add these methods:

      public void makeNode (String name) { ...}

      public void makeEdge (Node n1, Node n2) { ...}

Hide the Node constructor, and move responsibility for Edges into the Graph.

 

Graph now owns responsibility for creating nodes and edges. Since Graph creates node and edges, we'll make it responsible for deleting them as well.

 

 

 Make Graph use a Factory Method for Nodes and Edges.

 

What's our code now?

 

public class Graph {

      public Graph() { ...}

      public Node makeNode (String name) { ...}

      public Edge makeEdge (Node n1, Node n2) { ...}

      public Enumeration getNodes() { ...}

      public Enumeration getEdges() { ...}

      public void remove (Node n) { ...}

      public void remove (Edge e) { ...}

}

 

public class Node {

      public String getName() { ...}

      public void setName (String name) { ...}

      public Enumeration getEdges() { ...}

      public Enumeration getNodes() { ...}

      public Enumeration pathTo (Node n) { ...}

      public boolean equals (Object n) { ...}

      public int hashCode() { ...}

}

 

public class Edge {

      public Node getSource() { ...}

      public Node getSink() { ...}

      public boolean equals(Object e) { ...}

      public int hashCode() { ...}

}

 

[TBD - summarize]


 

 Document! Document! Document!

 

As you move your framework from being a private entity to becoming a public one, you must improve the documentation so other people can use it. (For one thing - they're not experts in your subject - that's why they want a framework in the first place!) In The Mythical Man-Month, Brooks says that a public documented product will take about 9 times the effort to produce.

 

Where do we stand?

 

Well, the 2-line comment: "How to use: subclass Node, and add any fields you need for your information" is not universally regarded as adequate documentation for a framework.

 

What is needed? I'd suggest at least three things: javadoc comments in the code (so a reference manual can be automatically generated), a framework user's guide, and a test gauge. The javadoc should explain each public or protected class and method. The user's guide should contain an overview, to explain the overall philosophy, and a set of task-oriented examples. The test gauge helps demonstrate that the framework is working.

 

For the graph class, a minimal user guide would include:

                Overview: What is a Graph?

                How to create a graph

                How to make nodes contain special information

                How to make edges contain special information

                How to define a graph algorithm

                Index

 

Provide concrete source code for complete, runnable applications. This lets people move from a working example to the application they want to build. You'll rarely hear a complaint "Too many examples."

 


 

 Define what happens.

 

Bertrand Meyer, the inventor of Eiffel, defines contracts in terms of pre-conditions and post-conditions:

 

                Pre-condition: If you do this...

                Post-condition: This will happen.

 

Under this regimen, it's the caller's responsibility to ensure that preconditions are met, and the callee's responsibility to fulfill the contract. (Meyer also defines a class invariant that will be true for the class at (almost) all times.)

 

***** Expand to two separate topics.

 

You may not want to develop these contracts completely, but users need to know what is legal for each argument. For each return value, you need to tell what values are possible.

 

 

 

Numeric (short, int, etc.)

Can it be negative? 0? positive? Is there a lower limit? upper limit?

Character

Is 0 allowed? Any restrictions (e.g., ASCII only)?

String

Null allowed? 0-length string? Character restrictions?

Object

Null allowed? Type restrictions?

Array

Null allowed? 0-length array? Min/max size?

Enumeration

Null allowed? What types are returned?

 

 

There's nothing wrong with restricting values - not documenting the restrictions is a problem.

 

For example, in our code we do not want to allow edges to have one of their nodes be null. (We never expect getSource() or getSink() to return null either.) This is a reasonable restriction.

 

If you don't tell people what's legal, they only have one way to find out: try it and see what happens. That's no good though, because it only tells them about today. If you haven't said what's right, they can't trust it tomorrow.


 

 Keep independent features separate.

 

If one feature can change independently of another, don't provide an interface that couples them.

 

The graph classes are pretty good on this front (most methods have zero or one argument). So, we'll make up an example. Suppose you create a document class and have a method:

      public void setStyle (Font f, int mods, int size)

[TBD - Regard a style as being…]

 

Can we conceive of changing the size without changing the font? Of course. So we end up with client code like this:

      setStyle (getFont(), getModifiers(), newSize);

We've had to fetch the old font and modifiers just to pass them in again - wasted work!

 

It can be even worse. Suppose we don't have getFont() or getModifiers() methods; instead we have a getStyle() method that returns them. Then we have to call getStyle(), save the result, pull the pieces out, and pass them to setStyle(). (Odds are good that would require another class to hold the style information.)

 

Or, worst of all, suppose that the class doesn't provide any method to access the current style. Then you have to track it through variables external to the class.

 

The "compound" setStyle() method is really only acceptable as a convenience method: if all three attributes are often set together, and the individual "set" methods are available, then we can define

      public void setStyle (Font f, int modifiers, int size) {

            setFont(f);  setModifiers(modifiers);  setSize(size);

      }

You usually become aware of the need for these convenience methods after you've been using the class for a while, and usage patterns recur.

 

Be suspicious of methods with a lot of parameters. Try to keep independent features separate.

 

***** Decision - what can change, what won't?</