Java Development News:

Managing Source Code for Complex Software

By Sebastien Tardiff

01 Oct 2005 | TheServerSide.com

Introduction

Software projects are too often viewed as one entity. All the code is in the same tree. As the code grows and more aspects are added, the source code becomes unmanageable. Such code is costly to maintain. A way to decrease the complexity is to organize the code into many separate source trees instead of just one. Tools can help to manage these components, including the dependencies among them. Open source projects like Tomcat and Apache went through a similar process, which resulted in many smaller projects that now have a life outside their parent projects.

In this article, I discuss the importance of component design and of the tools used to manage the components. It should be helpful for anyone who can impact the code structure of software projects.

Complexities

As a consultant, I have to deal with new technologies, different business domains and different ways of doing the same thing between businesses, which decreases my productivity. Even when I work on the same complex project for a long period, it's not always easy to remember the intricacies of a sub-section of a project I have to maintain 2-3 days every couple of months.

Wouldn't it be easier if tasks were confined to smaller projects, along with each project's own unit tests, documentation, and ANT/Maven scripts?

Let's go through the advantages of this approach.

Clear Model for New Developers

A components-based solution is more comprehensible, especially for new developers, because every module has its own service documents to highlight its different aspects. Test classes are organized by module so that they are easier to find and more uniform. Specifically, test classes for utility modules have unit tests and modules with course grain APIs have functional tests.

Leads Assigned to Modules

On the management side, we can more easily follow the good practice of assigning a lead developer to every piece of code. The lead is accountable for broken tests and defects logged against the module. Another important responsibility for the lead is to insure the uniformity of the module's public API. The task of assigning lead developers to individual modules is more straightforward compared to completing the same task for a single complex source tree.

Explicit Dependencies

On the communication side, high level views of a system are always useful. Generating or drawing a dependencies graph is the first step toward communicating the dependencies and, indirectly, the designs and functions of the components. A simple Visio diagram can make a huge difference. In subsequent iterations, grouping the components in layers and subsystems can help developers remember the components' dependencies. You can enforce these layers/dependencies with Eclipse.

Containment of Broken Builds

On a daily basis, things go wrong, builds break. Broken builds are often the result of developers being unaware of the dependencies between classes. Developers who work on a component can be certain that the rest of the project will compile if they don't modify the public API. Because a component has a more thought-out public interface, its interface should be more stable. Therefore, if a module breaks the build, it is likely that swapping in a previous version of the module will allow the build to compile, as a temporary solution.

The same idea applies in production. When a patch is needed, the tendency is to release all the code again because it's safer. On the other hand, if a patch covers just the internal code of components, it's safe to just release/test a new version of the components.

More Deployment Options

On the deployment side, when running in some deployment scenarios, like on a PDA, using smaller components makes it easier to move out the unnecessary code. In addition, switching to a model where customers are charged by modules is easier. However, clients or operations may be confused by so many JARs. A solution is to merge the JARs. The following example shows how easy merging JARS can be. This zip ANT target works because JAR files are a subset of ZIP files:

<zip destfile="${dist}/manual.jar">
    <zipfileset dir="htdocs/manual" prefix="docs/user-guide"/>
    <zipgroupfileset dir="." includes="examples*.jar"/>
</zip>

This example JARs all files in the htdocs/manual directory into the docs/user-guide directory in the archive and includes all the files in any file that matches examples*.jar, such as all files within examples1.jar or examples2.jar.

Reusability Awareness

On the psychological side, an important benefit of having components instead of related classes all in the same source tree is that the components are more obvious. Developers who add functionalities will more likely think of adapting an existing component instead of copy/pasting code.

Open Sourcing

On the business side, open sourcing is a way to share maintenance costs with the communities. Not all components are good candidates. The components need to be useful to a large audience. Framework components or UI components are often the best candidates, for example: components that generate customizable calendars or tables in HTML.

Open sourcing components provides free code reviews, bug reports, and bug fixes, which results in better quality and more flexible and cheaper components. By open sourcing, I mean to make the components' source code available for free, so the contribution made by the community is at the source code level rather than at the defect or enhancement requests level. Popular open source licenses vary from not permitting the use of the software in commercial products to full rights to modify the code and sell it without open sourcing the modifications.

Open source hosts, like http://www.apache.org/ and http://sourceforge.net/, provide many communication tools that are often under-used by private industry. You may learn better ways of managing your own components from these hosts.

Many Components Does Not Equal More Scripts

On the scripting side, more components seem to require more scripts. However, using inheritance and reusing methods in the scripts saves you from most duplication of code. In fact, having more components results in simpler scripts for each component. Consequently, the build is easier to comprehend.

JARs Mismatch

On the versioning side, when just one project is using the components, it's simpler to release all the components at the same time and version them all together as the overall solution. For the other situations where two projects share components, you may encounter a situation where project 1 is in code freeze while project 2 is still in development. In such cases, it's better to release the shared components independently in order to minimize component branching. Maven has in its core a flexible way to select specific versions of a component or the latest version available.

Managing Dependencies

On the IDE side, with Eclipse each component is represented by its own project. Eclipse supports declaring which project (component) a project depends on. Dependencies in Eclipse can be set up as "external." Here is an example of external dependencies: if component 1 has component 2 and 3 for direct dependents and component 4 is set up to depend on component 1, component 4 also implicitly depends on components 2 and 3.

Faster build

On the iterative side, in Java building is quite fast if you don't work with EJBs. However, test-driven development requires compiling so often that you want to optimize as much as possible. The script should build only the components that need to be updated. You could enable the use of regular expressions in the script for filtering which components the script must include or exclude.

Reusable dependencies declaration

Most projects are set up with an IDE for development and a scripting language like ANT for automated release builds. Unfortunately, this is often done by duplicating the components' location and dependencies. This should be avoided. While NetBeans uses ANT for underlying project definitions, Eclipse doesn't. However, Eclipse project files are quite simple. Therefore, it's possible to either extract the dependencies from Eclipse projects so Ant can reuse them or to generate the Eclipse projects from Ant or from an external definition like an XML file.

An example to show how simple an Eclipse project is:

An Eclipse project always consists of two files: ".project" and ".classpath." The ".classpath" file looks like this:

<?xml version="1.0" encoding="UTF-8" ?> 
 <classpath>
    <classpathentry kind="src" path="" /> 
    <classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER" /> 
    <classpathentry kind="src" path="/AnotherComponent" /> 
    <classpathentry kind="output" path="" /> 
   </classpath>

Eclipse specifies the current project's dependent projects using the source path (kind="src"). [/AnotherComponent] is a dependency because it starts with a slash.

As you see, generating or read Eclipse project dependencies should be simple.

Conclusion

In conclusion, I have presented the way to stop the proliferation of project complexities by modularizing the code as explicit components. We have seen that a component-based solution can help manage the diverse aspects of software development, from promoting reusability awareness through open sourcing to better communication. Let's keep software simple.

About the Author

Sebastien Tardif is a Software Consultant in New England. He has been developing Web Services for the last 5 years. As a side line, he has been trying to improve the development process for all his clients. Contact Sebastien at at925@freenet.carleton.ca.