Chapter 7, Part 2: Layered Frameworks TBD - Layers depend “down”. Layered frameworksAn alternative approach to monolithic structures is to build systems based on layers: things at level n+1 depend directly on level n.
In the ideal case, a level directly uses only the level beneath it (not levels lower down), and is not aware of layers above it. There are usually a number of services made available at a particular level. Java takes this approach to some extent, though not as radically as it might have. The diagram indicates that we can use the core libraries by themselves, but if we use servlets we will need both servlets and the core. What about javamail? It probably doesn't depend on servlets (but that depends on the person designing the system). Sometimes drawings show dependencies, with parts sticking out to show that lower levels may be directly accessible.
TBD - new example Are layers good or bad? Mostly good. They let you take a "virtual machine" approach, treating lower levels as a black box, worrying about the current level and the one beneath it only: "Levels" provide an approach that lets you gain intellectual control over your system. Does layering have any limitations? Yes: not all systems have clear layers. For example, there is a mutual dependency between java.lang and java.io. We can't put either layer on top, so they have to "go together." LayeringTo help organize and maintain intellectual control of software, one common approach is to build it in layers. In this way, rather than needing to understand all pieces at once, we can work a layer at a time. Most applications provide an example of this structuring (in the large). Suppose you develop a news reader program in Java. This will be its structure: News Reader Java Virtual Machine Operating System Hardware As application developers, we can concentrate on our news reader, relying on the facilities provided by the JVM, the operating system, and the hardware. In this way, we leverage thousands of person-years of prior work. (This is much easier than starting with a bucket of silicon or logic gates.) The layering approach can apply within our application as well:
The layering approach is sometimes called the virtual machine approach. It is a recursive concept. The idea is that a news reader is too complex to develop from scratch, but if we could rely on an NNTP protocol machine to make it easier, we could implement the user interface relatively quickly. The only problem is - we don’t have an NNTP protocol machine. (Here’s the recursion:) An NNTP machine would be relatively easy to write if we had a network interconnection machine. Notice that this machine is simpler than our original need (e.g., it needn’t deal with the user interface.) Eventually, we get to a machine that’s simple because we rely on one provided by someone else (perhaps the Java Virtual Machine). We described this as a top-down approach. But in actual practice, especially as framework developers, we often need to work the other way (bottom-up). This has a couple risks: · We might miss our target needs. · Particular users pay for the whole layer’s generality. Consider how the Java designers addressed this: they put out a basic implementation, then added a number of useful auxiliary services. If our job is news reader design, Java has both too much and too little. Too much, because we may not need Enterprise JavaBeans or servlets or RMI. Too little, because it doesn’t directly support NNTP or newsgroup thread tracking or other things we might want. Layering for FlexibilityIntellectual control is one reason to structure software into layers. A second reason is for flexibility. By using well-defined interfaces between layers, we can replace a layer with another implementation that provides the same interface. This lets us use different approaches to address non-functional requirements: performance, reliability, cost, etc. Java demonstrates this flexibility. Suppose you have an Intel PC. You can run Windows NT, Linux, or Solaris. On top of that you can run a JVM from Sun or Symantec. On the JVM you run your Java application. LAYER PROVIDER Your app You JVM Sun, Symantec, etc. OS Windows, Solaris, Linux Intel Architecture Intel Actual hardware Intel, AMD, etc. In the ideal case, you can think of layers as a stack of blocks. You can slide in a replacement e.g., a new JVM. In reality, it's a little bumpier: JVMs depend on the operating system they're on. If you try to slide out just the JVM, you find it "sticks".
You can assemble the pieces in the frame to get a runnable application. (Think of this like those “funny face” books where you can match one set of eyes to another nose and another mouth, or a puzzle in a frame.) Improving layersBy layering, you let the pieces improve separately. The layers are usually coherent entities, so it's fairly easy to work within a layer. By keeping the interface the same, you can "drop in" the improved part without changing the other layer. So, suppose the underlying machine becomes twice as fast. We can also move from an interpreter-based JVM to a JIT (Just-In-Time compiler) for a 5x speedup. If we drop in these two improved layers, we get a 2*5=10x speedup. We didn't have to update our operating system or change our application. Perhaps next week there's a bug fix release of the operating system. You drop it in, and everything runs with fewer crashes (though perhaps no faster). Notice how the layers are independent: the hardware engineer doesn't have to know much about editor applications. The operating system person doesn't have to know compilers. Coupling among layersThere are a couple problems or difficulties in layering. For one thing, layers hide dependencies that could be used to advantage. For another, it's hard to take part of a layer - pieces within a layer tend to intermingle. Recall how we said the hardware engineer doesn't need to know the application. But suppose this application is the focus of our system. Anything irrelevant to the application adds to the cost but doesn't help it work any better. For another example, machines in the 1960s and 1970s were evolving to more "expressive" and complex machine instructions. At the same time, users were moving away from assembly language and into higher-level languages such as Fortran and C. Both hardware and applications evolved with limited influence on each other. At some point, researchers started looking at what code was actually compiled and run in real applications. They found that real programs tended to use the simplest modes and instructions. In some cases, there were modes and instructions that compilers could never generate - only an assembly-level programmer could use them, and those people were becoming more rare. The unused instructions had a cost: they took time to develop and test, they took space and material in the CPU. Sometimes other features in the CPU had to be compromised so these (mostly unused) instructions could be included. Hardware designers developed RISC (Reduced Instruction-Set Computer) CPUs by omitting the complex features (and pushing some work into the compiler). They found that by simplifying the instructions, the hardware could be made to run faster, and complexity could be added where it provided the most benefit. This didn't happen because of work at a single level - it happened when two (or more) layers were considered together. Considering layers together is more complex. Suppose there are 6 connections from layer A to layer B, and 7 from B to C. If you look at A-B and B-C separately, you are considering only 6+7=13 connections (an additive effect). If you consider A-B-C together, you have to consider 6*7=42 connections (a multiplicative effect). When multiple layers are involved, and more connections, this multiplicative cost can be very expensive. Pure vs. Cumulative Layers InterfaceShould an upper layer be able to “see” below the layer that supports it? On one hand, if we only expose one layer to another, we maintain a more narrow interface. This can help in managing intellectual control. It will be easier to change lower interfaces with less impact on upper layers. We might think of such pure layering as looking like this:
On the other hand, features useful at a lower layer are often useful to upper layers as well. We might think of cumulative layers like this:
At the top, we can use any lower interface. This gives us more features, but it can be more difficult to swap out a layer. A Note on DiagramsGenerally, draw boxes to show dependencies: A over B implies A depends on B.
(Here, A uses both B and C.) Boxes at the same level are peers, and shouldn’t be assumed to depend on each other. (This is not a strict rule. [TBD]) You can adopt the condition of a dotted line showing that peers may depend on each other, or you can use “+” to show that they go together:
(Here, A depends on B and C, and they may depend on each other.) You may sometimes introduce a box just to close off the dependencies. (Sometimes this box is left unnamed.)
(Here, B depends on C and D, but A depends only on B (directly), or we leave B anonymous.) Box notation can’t handle arbitrary dependencies.
We can’t draw this with stacked boxes. We can sometimes develop a clever picture:
Or we can use repetition:
Or we might just simplify the relationship:
Issue: Cycles
If you have a cyclical dependency, you can’t put it in proper layers - what would be on top? Solution: break the cycle. The real situation might be like this:
(where the original A becomes A” + A’.) Here, C only depends on the A” part. Then, we can layer: A’ B C A’’ Or, we might be able to introduce an abstract class of some sort: A B C AA [TBD - R. Martin - Needs example] Performance Implications Layered software has implications for performance. We’ll address these areas: · Cost of layer transitions · Mismatched interfaces · Suboptimal implementations · Duplicate implementations Cost of layer transitions When we have software in layers, we may have a high level request that travels down to lower layers, which implement it. A->B->C This increases cost in several ways. First, there is overhead in getting the message through the layers. In the example, A calls B and B calls C. Each call has a cost. In many situations, these will “just” be procedure calls (although those can add up) but in some distributed systems, the calls across layers involve a network connection. Those are expensive. A second cost is the cost of introducing intermediate layers that have no purpose other than forwarding a message. (This is most common with pure layering, where each layer interface fully supersedes the layers below it.) In our example, the whole cost of B might fall into this category. Finally, because these are in separate layers, we pay a price because of the context. When we call A, it might have been better if we could just substitute the work done by C (in effect inlining it). Inlining might expose opportunities for further optimization, unavailable when C is buried beneath layers. Solution: Balance costs. Be aware of transition costs (especially in distributed systems). Balance out hiding and flexibility versus performance costs, especially in pure layering. Mismatched Interfaces Sometimes, a layer provides an abstraction that doesn’t quite meet the needs of the calling layer. The upper layer must spend effort to map this to the needed interface. For example, consider the news reader again. It might provide an abstraction “Enumeration of articles” corresponding to the list of articles retrieved from the server. Some implementations are content to show articles in the server’s order, so this interface might be fine for them. Other implementations will want to provide the ability to sort articles by date or author. They would have preferred a different interface - perhaps an array with sorting abilities. Since the lower layer only provides the enumeration, the upper layer might take that list, copy it into a vector, and add the sorting feature. Note what’s happened: we’ve effectively changed the performance characteristics of our implementation. The mismatched interface causes extra work (fetching the whole list and copying it to a vector). We knew we’d pay for the sorting, but we didn’t expect to pay for the copy. Finally, there’s often a kicker: the lower layer may be using the desired interface already - it just didn’t expose it so it could retain flexibility of implementation. The “solution” to this sort of problem is just careful design. Keep in mind the potential needs of upper layers, and try not to lock out a beneficial solution that could support them. Suboptimal Implementations Sometimes we take advantage of a layer’s service even when we’d be better off implementing our own solution. The abstraction may be right, but there might be a better concrete implementation. A good example of this is the HashTable in JDK 1.1. A HashTable maps a string to an object. This is a common need, so it’s easy to take any such mapping we have, and program it to use the HashTable object. HashTable is a fine class, but it’s not optimal for all mappings. There are just too many variations of needs: small or large sets, frequent or rare changes, types of keys, and so on. You used it for expediency, but you pay a runtime penalty relative to what you might have written. (Note: Java 2 defines a Map object with a couple different implementations, tuned for different data characteristics.) [KEY] Most of the time, you want to use library objects. Don’t avoid HashTable or anything else that makes you productive. When you assess performance, and an object profiles as being expensive, consider providing a tuned concrete implementation. [TBD - use of interfaces to allow this flexibility] Duplicate ImplementationsIn a large system, you’ll find that some objects re-appear in various layers. +-------------------------------------------------+ |
UI | Command-parser | +-------------------------------------------------+ | NNTP |
NNTP-parser | +-------------------------------------------------+ | Arg-parser | Error-handler |
File/Net-connector | +-------------------------------------------------+ Here we see various parsers appearing in a pure layered system. In a large system, people will write these various utilities without realizing that others have done the same. You may pay the price of having various copies of essentially the same code. There’s another aspect of this too: these various objects are often not as general and not as tuned as they could be. Instead of spending 3 units of time to write 3 parsers, spend 1 unit to write, and 1 or 2 to optimize. The result will often be more consistent and more effective. From Layered to InterlockingObjects in layered frameworks tend to connect not only down, but across as well:
(Notice that there are no arrows going from the lower layer up toward the top layer.) The problem comes when you only need part of a layer. Suppose we want the features in the upper left box (A). Marks those features and the ones they rely on:
Notice that as we move down layers, more features will tend to be marked. For example, an application at the top level will almost never be concerned with whether a particular assembly instruction is used. The application trusts the JVM or the operating system layers to make the most effective use of the lower layers. Using only part of a layerFor example, QuickTime video [TM Apple] was originally available only on the Apple Macintosh. Apple decided to make it available on Windows as well. Media layer Quicktime + other stuff OS layer Mac OS Hardware 68K, PPC If they want to deliver QuickTime only, they want as little non-QuickTime stuff as possible to be included. When you look at a piece of a layer, it looks like this: Interesting piece (g) Dependent stuff (y) Unrelated stuff (r) You also need things from the layer below: Stuff
the interesting -> Dependent stuff (y) Stuff the upper Unrelated(r) stuff
depends on dependent
stuff relies on To deliver the interesting piece, you have to also deliver the dependent stuff. The question becomes: how big is the yellow piece? If you're unlucky (or not careful), the whole layer will be yellow [TBD] Use of interface or abstract classes helps decouple things - making fewer dependencies. This makes it easier to pull out part of a layer. (See the later discussion on interlocking frameworks.) There's an example of this problem in the Java area: the efforts to define EmbeddedJava and PersonalJava, "reduced" environments for minimal (e.g., embedded) systems. The designs have had to decide what to pull in from the full Java. They've had to be careful - they don't want one piece to pull in all the others. For example, EmbeddedJava does not mandate a file system or network connection. But Object depends on Class, which has the forName() method that loads classes from the file system. To run on restricted systems, they've had to define what to do in that situation. Are layers bad? No, they just have limitations (especially if you intend to break up a layer). At their best, layers provide barriers that let us consider two layers at a time. This lets you build software much more efficiently than when you have to consider all layers simultaneously. Exploring DependenciesIn Java, a ClassLoader is responsible for loading a class into memory so it can be run. The ClassLoader is a runtime object, usually loading classes as needed. Before Java, languages like C or C++ typically used a linker. This tool analyzes the object files and their dependencies, and produces a static executable file that contains all needed object files. Notice that a ClassLoader works during runtime while a linker works before runtime. To analyze dependencies, we need to think like a linker: “Given that we need this, what else could we possibly need?” There are various levels of granularity we could use. Most commonly, we look at either the object level or the package level. Since classes within a package are usually developed and managed together, and often know about and depend on each other, it’s often most appropriate to work at the package level. Another variable for dependency analysis is what level of access implies a dependency: public, protected, private/package. If we look at public access, we’re considering the implications of the interface for clients of the package. If we look at protected access, we’re considering packages or classes that subclasses will need. If we look at private and package-level access, we’re considering what is used to implement the package. [TBD - chart] Suppose we want to layer the core Java packages (java.lang, java.awt, etc.) Clearly, java.lang.Object must be in the bottom layer, as all objects depend on it. Package Class java.lang Object [TBD - duplicate??] We’ll consider the public and protected access, and see what else we require. Most of Object’s methods deal with simple types (such as int) which we’ll assume as given. But getClass() returns a Class, and toString() a String. The wait(), clone(), and finalize() methods introduce InterruptedException, CloneNotSupportedException, and Throwable. If our analysis were at the class level, we’d carefully go through each of those classes, seeing what they pull in. Since we’re working at the package level, once a package is mentioned, we assume all its classes are included as well. So, we’ve pulled in all of java.lang. We sort of expected this, since those classes are so pervasive we don’t even require an import statement for them. What classes and packages does java.lang require? There are a lot of exception classes, that mostly depend on each other and String. There are wrapper classes for the basic types (such as Integer); these don’t introduce anything new. Compiler, Thread, and ThreadGroup don’t add anything either. Then we get to the troublemakers: Class: java.lang.reflect.Constructor, .Field, and .Method; java.net.URL; java.io.InputStream ClassLoader: java.net.URL, java.io.InputStream Process: java.io.InputStream, .OutputStream RunTime: java.io.InputStream, .OutputStream SecurityManager: java.net.InetAddress; java.io.FileDescriptor System: java.io.InputStream, .PrintStream; java.util.Properties This brings in new packages: java.io, java.lang.reflect, java.net, and java.util. (We determined this by going through the class definitions.) Should we have expected this? Pretty much. We knew that network access is built right into Java, so the occurrence of I/O and URLs seems natural. Java has reflection for all objects, so this is appropriate too. Finally, utility packages are common, so that’s fine too. We’re not done though: the layer also needs to include what those packages need. Java.io: These classes depend on basic types, String, java.lang, or each other. Java.lang.reflect: Nothing new; depends on java.lang. Java.net: Depends on java.lang and java.io but introduces nothing new. Java.util: Depends on java.lang and java.io but introduces nothing new. Think how lucky we were with the java.util package. This is a grab-bag package: containers, random number generator, observer, time/date, etc. If any of these classes (essentially having nothing to do with Object per se) had required other packages, we’d have had to pull them in as well. java.lang | java.lang.reflect | java.io | java.util Where does awt fit? The java.awt package needs java.awt.event and java.awt.peer, java.awt.datatransfer, and java.awt.image. The java.awt.applet package only depends on java.awt directly. The java.beans, java.math, java.text, and java.util.zip packages are each pretty self-contained. Awt (applet) || beans || math || text || zip How should we layer these? Since they don’t really depend on each other, it isn’t right to have awt above zip, or vice versa. Putting them in the same layer is OK, but it couples the packages together: +-----------------------------------------------------------------------+ | applet | | | |
| || ||
|| || | +--------+ | event | peer | dt | image || beans || math || text || zip | |
awt | |
| | || ||
|| || | +=======================================================================+ |
lang | reflect
| io | util | +-------------+-------------+--------+----------------------------------+ What does the coupling do? Already, lang, reflect, io, and util must be understood together. Similarly for awt.*. If we provide access to all these classes, upper layers will tend to use them (and that’s certainly happened for the java libraries.) We’re at the edge of layered systems - to go further, we’ll look at interlocking frameworks.
Java Support for Layers[tbd] Missing linker support - explicitly export symbols. Chapter Summary[TBD] |
|
Copyright 1994-2006, William C. Wake - William.Wake@acm.org |