Code generation is nothing new, especially for Java programmers, but it is still confusing to most people because of the variety of code generation models and solutions. This article will help you cut through the fog by providing a summary of the popular models and solutions in the Java world today.
To simplify matters we will concentrate on tools that aid in generating the code for the database back-end because, for reasons that will become clear, it is the ideal starting point for generation.
Code Generation to the Rescue
Let’s step back for a second. What is the best part about software engineering? For most of us it is the creation of useful products for our customers. The look in someone’s eye when you deliver a genuinely critical piece of software that is beneficial to them. The technology is important, but only as a means to an end. For example, the code that gets information in and out of the database, what Sun likes to call plumbing, is interesting, but it’s not why we got into this business.
It’s ironic that Java, a language and a set of tools invented by Sun, requires a lot of plumbing code to simply get data in and out of the database. There are a number of persistence options, and some options, like EJB, take up to seven classes and interfaces per database table. It’s no great wonder that of the 150 tools listed in the Code Generation Network database that the majority are for Java, and that the majority of those are for building database persistence code.
High quality generators will have these characteristics:
- It will produce quality code. The kind of code we would write ourselves if we had the time.
- It will be standards compliant when applicable.
- It will build that majority of the grunt work code.
- It will be flexible to changes in the requirements of the application.
- It will provide us flexibility in terms of technology and deployment options.
- It should be reasonably well documented and supported.
- It should provide a good return on our investment.
It’s the last item, the return on investment, that you should be looking for as we go through the generators. If you are going to put a lot of effort into getting your code or model into a form that is usable by the generator, it should give you a nice return on that effort. Of course it will build the code, but what else can it do for you? Documentation? Models? RPC interfaces? Unit tests? Sometimes this return extends past the tool itself. So you will want to keep an eye out for yourself to see if you understand how the generator works and how you could extend it yourself.
We will start off the survey of tools by looking at those you can get off the shelf.
Out of the Box Generation
All generators are custom generators at some level. Some are built from scratch for a specific application or environment. Other generation solutions are built using an off-the-shelf generator and customizing it to suit your needs. When you use something off the shelf you leverage the documentation, the standards, and the user base. Because of these advantages we start by looking at the code generators for Java you can use today.
Getting More From Your Code
In the extreme programming model the code is the documentation and the most accurate business model for the application. To fit with this model there are generators that use the code as the definition of the input model. When we are generating database code this means the generator will be analyzing the existing beans and creating the infrastructure classes and interfaces to support those beans.
The primary example of this model is XDoclet. Using the comments in existing Java beans the XDoclet engine can build database access code based on a variety of different technologies. These include EJB/BMP, JDO, Castor and Hibernate. In addition, with XDoclet's extensible architecture it can now generate code for JMX, Web Services and Struts. Manning covers this topic with its new book, XDoclet in Action.
Another version of this is the Eclipse Modeling Framework generation system which can use either the code or a UML model as input to the generator. The advantage of this model is that the system is self contained. The annotations in the code help to build the infrastructure code. There are no documents external to the system (e.g. the UML model) to synchronize with the code. The disadvantage is that the business model is not held separate from the code in an abstract form. Holding the model separate is the primary advantage of model based generation.
Make Your Models Active
The more traditional model of software development has the engineers and architects constructing a model of the application before implementation. Often this model is merely printed and tacked on the wall as a reference. In these situations the model on the wall almost never represents the model in the code. I think we have all been there. What if you could use the model to help build the code?
ModelJ, an open source project on SourceForge, uses a representation of the model, stored as XML and builds Struts and EJBs for J2EE. To add new fields you simply add the definitions to the XML input file, re-run the generator to build the code, then re-compile.
What is Java about if it’s not about standards? Is there a standard for model driven generation? Yep.
Model Driven Architecture
Model Driven Architecture (MDA) is a new standard from the Object Modeling Group (OMG) that defines the architecture of generators that will turn a UML model into code. The first step is to export the model in XMI. This is then taken as input by the generator. Internally this model is called the Platform Independent Model or PMI. This model is then transformed into the Platform Specific Model or PSM. The PSM is then used as the reference model by the templates which will produce the code.
It’s easier to understand this through an example. The Platform Independent Model (PIM) would have the basic class structures, the fields, the relations, and the methods. The Platform Specific Model (PSM) would have all of the classes and interfaces required to implement that scheme. For EJBs you would have the entity and session beans for each of high level classes. The definition of these beans in the PSM would be used to build the code.
This separation between the models is a solid factoring of concerns, similar in style to the factoring of concerns in a three tier web server. The PIM and PSM (there can be multiple levels of PIM and PSM) architecture is defined in the MDA standard and is the hallmark of all well designed code generators.
The primary MDA generators are AndroMDA, ArcStyler and OptimalJ:
- AndroMDA – This the open source MDA generator. Developed by Matthias Bohlen and originally named UML2EJB this generator takes UML exported as XMI and does the transformations to turn it into EJBs, or other types of code, using plug-n-play cartridges. Recently the project has partnered with MagicDraw as it’s UML authoring mechanism of choice.
- ArcStyler – Richard Hubert of Interactive Objects and the author of the book Convergent Architecture. His company produces ArcStyler, an MDA generator for both J2EE and .NET architectures now in it’s 4.0 release.
- OptimalJ – Last year was a big year for Compuware. They moved to a four pronged strategy, one of which was code generation through the third release of their OptimalJ MDA package. OptimalJ is only for Java and J2EE. In addition to code generation they have extended the range of the package to integrate directly with the popular IDEs to enforce no-coding restrictions on sections of the code that are generated by the templates.
- EMF – The Eclipse Modeling Framework is an MDA generator built directly into the Eclipse editor. One of the particularly interesting aspects of EMF is it’s ability to use the source code as input as well as XMI. When the code is generated it is built with tags that encapsulate the model. These models can then be read back in as a basis for the model. So you can go in both directions with the model. Addison Wesley has a book on the Eclipse Modeling Framework.
In addition to these large MDA generators there are some smaller, arguably more pragmatic MDA generators that one might consider as MDA lite:
- iQgen – This is a pragmatic MDA generator from innoQ. Out of the box the iQgen generator builds EJBs from UML exported as XMI.
- MDE – Metanology’s entry into the MDA race, named MDE, ingrates directly into IDEs and has an integrated UML modeling system. From here MetaPrograms are run that create the code from the UML model. MDE has solutions for both the J2EE architecture and .NET.
Now that we have covered the range of off the shelf generation solutions we can take a look at how to build a custom generator from scratch.
Take Control Yourself
Existing code generation solutions are great at building new code but often have issues integrating into existing code bases. Building your own generator means that you can customize the function and the output of the generator to integrate it into your existing software development cycle. Your are also in complete control of the architecture of the entire system.
Each custom generator is different but I can give some recommendations. Most importantly, you should stay with open standards when you have a choice. For example, you should use XML instead of a custom text file format as the input specification. Why? Because the value of the generator is not in the form of the input file, it’s in the function of the entire system. In addition, using an open standard like XML reduces the time for engineers to learn to use and to maintain the system. You will always be asked, “Could we have used something off then shelf?” Showing that you used open standards and tools where they were appropriate will help you defend your decision to build instead of buy.
There are some tools you should consider when you are architecting the generator:
- XSLT – This XML based text templating standard can be used both within a system or standalone. It has an extremely robust control structure and there are a number of excellent tools, including XMLSpy, which will help you build and maintain the templates. The XSLT Cookbook has a chapter dedicated to the use of XSLT for code generation.
- XML – This is an ideal storage format for the input specification to the generator. If you use XML you should use a DTD, XML Schema or Relax NG specification to validate the XML input.
- Jython – Performance is not extremely critical when it comes to generating code. So you can use an embedded scripting language, like Jython, to reduce the complexity of the generator code.
- JSP – Some engineers have used JSP as the text templating tool to build the code from the input specification. It’s advantageous because so many engineers are familiar with the JSP standard.
- JavaDoc – The JavaDoc application has a replaceable back-end. So if the design of your generator involves reading the structure of classes or the comments attached to the code, then you should consider using JavaDoc as the basis of the generator.
In addition to using standard tools you should also construct your generator in a sensible multi-tiered style. As I discussed in the MDA section, the proper model for a generator separates the system independent representation of the code from the system dependent representation. The code templates are applied to the system dependent version.
Another advantage to custom code generation is the ability to build code beyond the database access layer. So what types of code can we generate in addition to the database layer?
Code Generation in Context
Generating the database access layer of an application is the ideal starting point for an application because it creates a solid foundation for the business logic and user interface. Because the code is written by a program the class, method and instance variable names are extremely consistent.
The advantages of code generation, consistent high quality code produced quickly, can also be applied to other areas, such as web services, user interface, unit tests, or even business logic. But these are best done in a layering process on top of the a generated foundation of database access code.
The Model Driven Architecture (MDA) generators in particular look to extend the reach of the generator out from the foundation of the database access layer to build the entire code base directly from the model. Unfortunately UML 1.0 doesn’t have the semantic modeling capabilities to generate an entire application including the business logic. UML 2.0 is in the works with just this problem in mind.
In the meantime you should be on the lookout for tools that will help you generate portions other than the business logic layer. While keeping in mind that you will have to do customization to fit your needs.
All Is Not So Rosey
With all of this glowing talk about how easy it is to use and write code generators you may be wondering what the downsides are, and you would be right to. The primary problem is lack of maintenance, and this is the one most people have experienced. You often see the remnants of once-generated code in the source code control system which are now tweaked by hand. The solution for this is to integrate the generator as part of the build process and to never check in the generated code. This keeps the generator in the development lifecycle and ensures its maintenance.
The second most common problems with code generators are more cultural than technical. Perhaps it’s a side effect of the inherent skepticism in a good engineer, but most engineers suffer from a fear of the unknown. To successfully integrate a generator into the development process requires training, documentation and the gentle assuaging of fears. As engineers we like to focus on the technical aspects of a problem so these critical adoption issues are often overlooked.
I could write several articles on code generator adoption issues but suffice to say that a perspective shift is required. You need to think about the generator as another engineer on the team who owns and maintains sections of the code in an active manner. This understanding needs to be clear to all of the technical staff involved with the project.
Code generation is an extremely valuable technique. When you acknowledge and actively work around the pitfalls. Whether you build one yourself, or you use one off the shelf, are you getting a good return on your effort? With hand-coding probably not. But with a well architected and constructed generator you are going to be building a lot of high quality code from a central knowledge source and that will let you concentrate on writing the fun and interesting parts of the application instead of getting bogged down in plumbing.
If you are still unconvinced then why are you using code generation today? All compilers are code generators, and they have proven their worth. Why should we stop with creating VM code or assembler? Why not extend that into application code?
- Code Generation Network – Dedicated to code generation, this site contains interviews and articles about generation. As well as a database of all of the known code generators.
- Code Generation Alliance – An education and advocacy group for Code Generation.
- Program Generators with XML and Java – A book centered on using XML as an input to Java generators that build Java.
- Generative Programming – The seminal work on the theory of code generation.
- Code Generation In Action – Covers the different models of generators and applies them to a wide variety of target outputs, including database access code.
About the Author
Jack Herrington is a Senior Software Engineer with over twenty years of production coding experience. He is the author of Code Generation in Action and the editor-in-chief of the Code Generation Network.