OSGi: Because Sharing Shouldn't Be Painful
At the dawn of Windows, applications were statically linked and life was simple; applications carried their own dependencies so there was never confusion. However, this simplicity had its costs.
Obviously, there was the storage cost as each application carried copies of the same libraries. At the time a 500 Mb disk was considered huge, so any storage cost was high. There was also the problem of bug fixes. If a security bug was found, it was necessary to rebuild each and every application to ensure the bug was solved in all places. Trying to organize an update of all installed applications was a nightmare, and often impossible. And there was also the problem that certain libraries had to come from the platform, for example device drivers and the operating system libraries. These libraries could, by definition, not be linked statically.
The solution Microsoft came up was Dynamic Link Libraries (DLL), a concept long familiar to Unix users at the time under the (arguably better) name of shared libraries. Here's how it worked: when an application was loaded, the loader would automatically load the DLL into memory and link the application to the code and data in the DLL. DLLs could depend on other DLLs, so the loader might be required to recursively link DLLs, resulting in an often complex dependency graph. The developers looked at it and saw that it was good. That is, until the feature became heavily used. Then hell broke loose. Why? Because sharing is hard, as we first learn in kindergarten.
The problem with sharing in software is that all participants have to agree to what is shared, for now and forever. Application developers are known to make shortcuts and hacking a library to get that extra feature, or to patch that nasty bug, will always remain popular. Unfortunately, one developer's fix is another developer’s horror, especially when that other developer treated the bug as a feature. So, a modified DLL was not always compatible with all applications that shared it. In C++ adding a method to a class can already make any compiled code incompatible with a DLL. Application developers did not always consider the consequences of sharing and stomped (overwrote) the DLLs that were installed, causing other applications to crash, making many system administrators grey long before their time.
Initially, DLLs were not even versioned. This was added later, but few developers cared enough, including Microsoft themselves, to version their DLLs correctly, so versioning did little to solve the mess. However, even if DLLs were versioned correctly, one could still run into unsolvable situations caused by inconsistencies in the dependencies, even if all the components were stamped correctly.
Let me explain. If you have a graph of dependencies you can depend on the same DLL through different paths, but the constraints for each path can differ. For example, there is a general LOG DLL that is quite popular. A WIDGET DLL uses this library as well as a COMM DLL See the adjacent image for a visual of the situation.
If COMM.DLL requires version 1 of LOG.DLL, and WIDGET.DLL requires version 2, then the loader must make a choice as LOG.DLL can only be linked once in either version 1 or version 2. This is the so called "diamond" problem, as the graph looks like a diamond. In large dependency graphs these kind of inconsistencies are virtually unavoidable unless there is a single code base where all the components evolve at the same rate. In a world with so much free open source, this has become quite unlikely.
One solution to the diamond problem is to be always backward compatible. If LOG.DLL v2 is backward compatible with v1 then at least a clever loader can pick the latest version. However, forcing future generations to remain backward compatible forever with today's not always optimal choices is cruel to the developers that come after us.
A solution to sharing problems is to isolate: just stop sharing. In Windows, to solve DLL hell, the applications were therefore allowed their own DLL search path so they could have their private copies of a DLL. This was back to square one with just some additional complexity: waste of memory, no single bug fix place, etc.
With all the problems that sharing has, is it actually worth the effort? In Java, the WAR concept is extremely popular and it clearly isolates: every WAR contains all its dependencies, except for the Java environment and the Application Server provided libraries. Yes, there are significant problems with deployment because large companies replicate the same library in many different WARs, causing upload delays. At some sites, VMs are bursting at the seams trying to handle the enormous amount of often unnecessary byte codes. However, there are solutions to all these problems, and it should be clear that sharing has its fair share of problems as well.
So, to share, or not to share, that is the question.
I believe that sharing is inevitable, and this belief has driven my work since the eighties. Software should be built from components that collaborate on a peer-to-peer basis through explicit interfaces. This is the only way to solve the big ball of mud problem where the maintenance and modification of a code base becomes harder and harder due to its entanglement. If your software dependency diagrams look like a Jackson Pollock, you have such a big ball of mud.
In a component model, we must separate the role of the developer (makes components) and assembler (groups components into an application). Much work in our industry is pointing in the direction of greater autonomy of the different software subsystems. SOA clearly is based on a peer-to-peer model, but also large software libraries have their own threads and models of control that are independent of the main application. Open source projects are really close to components, but we’ve unfortunately not settled on a module system that would make these components so much easier to use.
The logical consequence of this higher autonomy is the need for sharing of the interfaces and allowing components to be linked in as late as possible to maximize flexibility. Component developers should preferably not even see collaborating components to prevent unwanted dependencies on an implementation. The less a component can assume, the fewer bugs it will have. Only the contract should be visible to a component during development. In this model, the set of components that are assembled together then form an application.
In 1998, with this collaborative model in mind, we started to develop OSGi. We wanted a model where components from different vendors would collaborate in a peer-to-peer fashion without prior awareness of these components. Not silos in a constrained environment but components that could find other components and work together instead of living in a constrained silo. To achieve this goal we knew we had to address the sharing problem, a.k.a. DLL Hell.
So what did we learn about modularity in those 13 years?
Modularity not enforced is not. Unless someone slaps you on the wrist when you violate a module boundary you will not be working modular. This the greatest disappointment of people moving to OSGi: finding out that their code base is not nearly as modular as the architecture and dependency diagrams had made people believe. Unfortunately, many people tend to get upset and blame the messenger.
By far the best solution to the problem of sharing is not sharing. Anything shared has a hidden cost in later stages of the life of a component. Any shared aspect must be carefully versioned over time and is restricted in its evolution. Sharing is expensive! For this reason, OSGi gives you a private Java name space in your bundle to hide as much code as possible for the rest of the world. Whatever you do there is guaranteed to not affect anybody outside your bundle. Different bundles can actually contain the same classes without causing naming problems. Sharing is good, hiding is better, and not having the thing in the first place is best.
Cost of Dependencies
Every dependency has a cost, but you'd be surprised how many people do not know exactly from where their dependencies come. Once I found more than 30 MB of dependencies were dragged in by a single (unnecessary) reference to Apache commons collections. Many WARs and Maven projects have a very fuzzy dependency chain because it is just too hard to figure out what the real dependencies are in today’s Java.
We found that for Java, the package is the right granularity to share. Classes are too fine grained to share. They are too connected to their neighboring classes in the same package. Exporting classes also awkwardly overlaps with packages in Java. JARs are cumbersome to share because they are an aggregation of the packages that are actually shared in your code. This aggregation has nasty side effects on versioning and dependency management. A later article will explore the concept of package sharing more deeply since it is one of the least understood and undervalued aspects of OSGi.
Unexpected Consequences of API based Programming
API based programming, that is, an API that can be implemented by different parties, requires one to revisit versioning as we know it. In Unix package management systems, the grandfathers of software dependency models, versions are based on a consumer that requires a provider. There is a tendency in Java to blindly follow this model but this is wrong. It is usually a good strategy to not reinvent the wheel but with an API based programming model we actually get a third party: the API/interface/contract. It turns out that this triad has profound implications for the compatibility rules and thus dependency model.
To allow the components to be properly connected requires resolving their requirements against the capabilities of other components. In such a wiring model it is paramount that packages are versioned correctly. In OSGi, an exporter declares its version and an importer can declare a range of versions with which it is compatible. The OSGi specification outlines the rules for semantic versioning[pdf], a model where the parts of a version have a well defined compatibility meaning, unlike almost all other (fuzzy) schemes that only must enforce always backward compatibility and do not recognize the consumer/interface/provider triad. The OSGi version scheme is strong enough to be calculated by automated tools, a necessity as human are awfully bad with versions.
In OSGi, you can clearly and unequivocally declare that you're not compatible with a prior version. Being able to express you don’t support backward compatibility with older versions is essential for a component system.
Dependency on Multiple Versions
Multiple versions of the same package is not something to which one should aspire, but when enough external dependencies are used in an application it becomes inevitable. In the Java community there is a tendency to ignore those pesky, multiple versions as if they cause no problems. For example, in Maven the first version found in the dependencies is used even if a later dependency has a higher version. It is not uncommon to find multiple versions of the same library on a large class path. Kind of odd because we are willing to expend quite a lot on type safety in the language but then throw much of the advantages away at runtime by being sloppy. In the OSGi, dependencies are carefully managed and it is an exact science, no fuzzyness or heuristics. OSGi frameworks handle the diamond problem by linking bundles to their correct version but only allowing collaboration between bundles that have no conflicts. In the previous DLL example, OSGi could easily handle the collaboration between App and COMM and App and WIDGET as long as they did not exchange LOG objects. The fact that OSGi supports collaboration and sharing but also isolation when needed is one of the key reasons Application Server vendors heavily use OSGi.
The majority of the OSGi Core specification is in the module layer where we define the rules for sharing. These rules are often complex, hard to understand, and require a lot of knowledge regarding name spaces, type systems, and Java’s inner bowels. However, these rules only have to be understood by a handful of OSGi framework developers. In return, OSGi gives modular developers a solid sharing model that is powerful and surprisingly easy to use.
It is therefore kind of sad to see how Jigsaw’s design is guided by Unix packagers, e.g. Debian, and ignores OSGi. However powerful these package systems are, they are solving a problem that only superficially resembles a Java module. With Jigsaw, the Java community faces traveling a long and winding road. During that passage it will become clear that OSGi is not the cause of complexity, it is just a messenger telling you it has to run unmodular code.
OSGi in Action By Richard Hall
OSGi in Depth By Alexandre de Castro Alves
Enterprise OSGi in Action By Holly Cummins
Liferay Portal Systems Development By Jonas X. Yua
Liferay in Action By Richard Sezov
The Well-Grounded Java Developer By Martijn Verburg
27 Jul 2011