January 2004
Introduction
Code generation is nothing new, especially for Java programmers, but it is
still confusing to most people because of the variety of code generation models
and solutions. This article will help you cut through the fog by providing a
summary of the popular models and solutions in the Java world today.
To simplify matters we will concentrate on tools that aid in generating the
code for the database back-end because, for reasons that will become clear,
it is the ideal starting point for generation.
Code Generation to the Rescue
Let’s step back for a second. What is the best part about software engineering?
For most of us it is the creation of useful products for our customers. The
look in someone’s eye when you deliver a genuinely critical piece of software
that is beneficial to them. The technology is important, but only as a means
to an end. For example, the code that gets information in and out of the database,
what Sun likes to call plumbing, is interesting, but it’s not why we got
into this business.
It’s ironic that Java, a language and a set of tools invented by Sun,
requires a lot of plumbing code to simply get data in and out of the database.
There are a number of persistence options, and some options, like EJB, take
up to seven classes and interfaces per database table. It’s no great wonder
that of the 150 tools listed in the Code Generation Network database that the
majority are for Java, and that the majority of those are for building database
persistence code.
High quality generators will have these characteristics:
- It will produce quality code. The kind of code we would write ourselves
if we had the time.
- It will be standards compliant when applicable.
- It will build that majority of the grunt work code.
- It will be flexible to changes in the requirements of the application.
- It will provide us flexibility in terms of technology and deployment options.
- It should be reasonably well documented and supported.
- It should provide a good return on our investment.
It’s the last item, the return on investment, that you should be looking
for as we go through the generators. If you are going to put a lot of effort
into getting your code or model into a form that is usable by the generator,
it should give you a nice return on that effort. Of course it will build the
code, but what else can it do for you? Documentation? Models? RPC interfaces?
Unit tests? Sometimes this return extends past the tool itself. So you will
want to keep an eye out for yourself to see if you understand how the generator
works and how you could extend it yourself.
We will start off the survey of tools by looking at those you can get off the
shelf.
Out of the Box Generation
All generators are custom generators at some level. Some are built from scratch
for a specific application or environment. Other generation solutions are built
using an off-the-shelf generator and customizing it to suit your needs. When
you use something off the shelf you leverage the documentation, the standards,
and the user base. Because of these advantages we start by looking at the code
generators for Java you can use today.
Getting More From Your Code
In the extreme programming model the code is the documentation and the most
accurate business model for the application. To fit with this model there are
generators that use the code as the definition of the input model. When we are
generating database code this means the generator will be analyzing the existing
beans and creating the infrastructure classes and interfaces to support those
beans.
The primary example of this model is XDoclet.
Using the comments in existing Java beans the XDoclet engine can build database
access code based on a variety of different technologies. These include EJB/BMP,
JDO, Castor and Hibernate. In addition, with XDoclet's extensible architecture
it can now generate code for JMX, Web Services and Struts. Manning covers this
topic with its new book, XDoclet
in Action.
Another version of this is the Eclipse Modeling Framework generation system
which can use either the code or a UML model as input to the generator. The
advantage of this model is that the system is self contained. The annotations
in the code help to build the infrastructure code. There are no documents external
to the system (e.g. the UML model) to synchronize with the code. The disadvantage
is that the business model is not held separate from the code in an abstract
form. Holding the model separate is the primary advantage of model based generation.
Make Your Models Active
The more traditional model of software development has the engineers and architects
constructing a model of the application before implementation. Often this model
is merely printed and tacked on the wall as a reference. In these situations
the model on the wall almost never represents the model in the code. I think
we have all been there. What if you could use the model to help build the code?
ModelJ, an open
source project on SourceForge, uses a representation of the model, stored as
XML and builds Struts and EJBs for J2EE. To add new fields you simply add the
definitions to the XML input file, re-run the generator to build the code, then
re-compile.
What is Java about if it’s not about standards? Is there a standard for
model driven generation? Yep.
Model Driven Architecture
Model Driven Architecture
(MDA) is a new standard from the Object
Modeling Group (OMG) that defines the architecture of generators that will
turn a UML model into code. The first step is to export the model in XMI. This
is then taken as input by the generator. Internally this model is called the
Platform Independent Model or PMI. This model is then transformed into the Platform
Specific Model or PSM. The PSM is then used as the reference model by the templates
which will produce the code.
It’s easier to understand this through an example. The Platform Independent
Model (PIM) would have the basic class structures, the fields, the relations,
and the methods. The Platform Specific Model (PSM) would have all of the classes
and interfaces required to implement that scheme. For EJBs you would have the
entity and session beans for each of high level classes. The definition of these
beans in the PSM would be used to build the code.
This separation between the models is a solid factoring of concerns, similar
in style to the factoring of concerns in a three tier web server. The PIM and
PSM (there can be multiple levels of PIM and PSM) architecture is defined in
the MDA standard and is the hallmark of all well designed code generators.
The primary MDA generators are AndroMDA, ArcStyler and OptimalJ:
- AndroMDA –
This the open source MDA generator. Developed by Matthias Bohlen and originally
named UML2EJB this generator takes UML exported as XMI and does the transformations
to turn it into EJBs, or other types of code, using plug-n-play cartridges.
Recently the project has partnered with MagicDraw
as it’s UML authoring mechanism of choice.
- ArcStyler –
Richard Hubert of Interactive
Objects and the author of the book Convergent
Architecture. His company produces ArcStyler, an MDA generator for both
J2EE and .NET architectures now in it’s 4.0 release.
- OptimalJ
– Last year was a big year for Compuware.
They moved to a four pronged strategy, one of which was code generation through
the third release of their OptimalJ
MDA package. OptimalJ is only for Java and J2EE. In addition to code generation
they have extended the range of the package to integrate directly with the
popular IDEs to enforce no-coding restrictions on sections of the code that
are generated by the templates.
- EMF – The
Eclipse Modeling Framework is an MDA generator built directly into the Eclipse
editor. One of the particularly interesting aspects of EMF is it’s ability
to use the source code as input as well as XMI. When the code is generated
it is built with tags that encapsulate the model. These models can then be
read back in as a basis for the model. So you can go in both directions with
the model. Addison Wesley has a book on the Eclipse
Modeling Framework.
In addition to these large MDA generators there are some smaller, arguably
more pragmatic MDA generators that one might consider as MDA lite:
- iQgen
– This is a pragmatic MDA generator from innoQ.
Out of the box the iQgen generator builds EJBs from UML exported as XMI.
- MDE
– Metanology’s
entry into the MDA race, named MDE, ingrates directly into IDEs and has an
integrated UML modeling system. From here MetaPrograms are run that create
the code from the UML model. MDE has solutions for both the J2EE architecture
and .NET.
If you are interested in MDA you should check out the books; MDA
Explained and Model
Driven Architecture. In addition the Code Generation Network’s page
on MDA.
Now that we have covered the range of off the shelf generation solutions we
can take a look at how to build a custom generator from scratch.
Take Control Yourself
Existing code generation solutions are great at building new code but often
have issues integrating into existing code bases. Building your own generator
means that you can customize the function and the output of the generator to
integrate it into your existing software development cycle. Your are also in
complete control of the architecture of the entire system.
Each custom generator is different but I can give some recommendations. Most
importantly, you should stay with open standards when you have a choice. For
example, you should use XML instead of a custom text file format as the input
specification. Why? Because the value of the generator is not in the form of
the input file, it’s in the function of the entire system. In addition,
using an open standard like XML reduces the time for engineers to learn to use
and to maintain the system. You will always be asked, “Could we have used
something off then shelf?” Showing that you used open standards and tools
where they were appropriate will help you defend your decision to build instead
of buy.
There are some tools you should consider when you are architecting the generator:
- XSLT –
This XML based text templating standard can be used both within a system or
standalone. It has an extremely robust control structure and there are a number
of excellent tools, including XMLSpy, which will help you build and maintain
the templates. The XSLT
Cookbook has a chapter dedicated to the use of XSLT for code generation.
- XML – This is
an ideal storage format for the input specification to the generator. If you
use XML you should use a DTD, XML Schema or Relax NG specification to validate
the XML input.
- Jython – Performance
is not extremely critical when it comes to generating code. So you can use
an embedded scripting language, like Jython, to reduce the complexity of the
generator code.
- JSP –
Some engineers have used JSP as the text templating tool to build the code
from the input specification. It’s advantageous because so many engineers
are familiar with the JSP standard.
- JavaDoc
– The JavaDoc application has a replaceable back-end. So if the design
of your generator involves reading the structure of classes or the comments
attached to the code, then you should consider using JavaDoc as the basis
of the generator.
In addition to using standard tools you should also construct your generator
in a sensible multi-tiered style. As I discussed in the MDA section, the proper
model for a generator separates the system independent representation of the
code from the system dependent representation. The code templates are applied
to the system dependent version.
Another advantage to custom code generation is the ability to build code beyond
the database access layer. So what types of code can we generate in addition
to the database layer?
Code Generation in Context
Generating the database access layer of an application is the ideal starting
point for an application because it creates a solid foundation for the business
logic and user interface. Because the code is written by a program the class,
method and instance variable names are extremely consistent.
The advantages of code generation, consistent high quality code produced quickly,
can also be applied to other areas, such as web services, user interface, unit
tests, or even business logic. But these are best done in a layering process
on top of the a generated foundation of database access code.
The Model Driven Architecture (MDA) generators in particular look to extend
the reach of the generator out from the foundation of the database access layer
to build the entire code base directly from the model. Unfortunately UML 1.0
doesn’t have the semantic modeling capabilities to generate an entire
application including the business logic. UML 2.0 is in the works with just
this problem in mind.
In the meantime you should be on the lookout for tools that will help you
generate portions other than the business logic layer. While keeping in mind
that you will have to do customization to fit your needs.
All Is Not So Rosey
With all of this glowing talk about how easy it is to use and write code generators
you may be wondering what the downsides are, and you would be right to. The
primary problem is lack of maintenance, and this is the one most people have
experienced. You often see the remnants of once-generated code in the source
code control system which are now tweaked by hand. The solution for this is
to integrate the generator as part of the build process and to never check in
the generated code. This keeps the generator in the development lifecycle and
ensures its maintenance.
The second most common problems with code generators are more cultural than
technical. Perhaps it’s a side effect of the inherent skepticism in a
good engineer, but most engineers suffer from a fear of the unknown. To successfully
integrate a generator into the development process requires training, documentation
and the gentle assuaging of fears. As engineers we like to focus on the technical
aspects of a problem so these critical adoption issues are often overlooked.
I could write several articles on code generator adoption issues but suffice
to say that a perspective shift is required. You need to think about the generator
as another engineer on the team who owns and maintains sections of the code
in an active manner. This understanding needs to be clear to all of the technical
staff involved with the project.
Conclusion
Code generation is an extremely valuable technique. When you acknowledge and
actively work around the pitfalls. Whether you build one yourself, or you use
one off the shelf, are you getting a good return on your effort? With hand-coding
probably not. But with a well architected and constructed generator you are
going to be building a lot of high quality code from a central knowledge source
and that will let you concentrate on writing the fun and interesting parts of
the application instead of getting bogged down in plumbing.
If you are still unconvinced then why are you using code generation today?
All compilers are code generators, and they have proven their worth. Why should
we stop with creating VM code or assembler? Why not extend that into application
code?
Resources
About the Author
Jack Herrington is a Senior Software Engineer with over twenty years of production
coding experience. He is the author of Code Generation in Action and the editor-in-chief
of the Code Generation Network.
PRINTER FRIENDLY VERSION
|