If you ask someone if they use code generation, chances are that they swear by it. Code generation saves time and effort and can greatly improve the maintainability of a system. Letting your computer write code for you is so compelling that it's hard to imagine why any developer wouldn't embrace it. Yet, code generation remains a bit of a black art, practiced by a few and met with skepticism and distrust by the rest.If you haven't yet been bitten by the code generation bug, let me explain why I think code generation matters to Java developers.
What is code generation?
Code generation can mean a number of things, but here I'm referring to the act of having a computer program generate Java source files, ones that you would have otherwise needed to write by hand, which are compiled as part of your project.With this style of code generation (known as active code generation, for those keeping score) the code generator owns the generated code and the programmers don't edit the generated code directly.
There are many ways to think about this type of code generation. I find it useful to mentally model it as template expansion. A template is an outline of the final product with the details left to be filled in, sort of like a JSP page. Input data is fed into a template, to fill in the blanks, and the result is a customized source file based on the input data. I'm not trying to suggest that all code generators use templates or even that somehow that is the preferred way to implement a code generator. It's just a simple and easily understood abstract model to help us think about code generation. In this model, the template encapsulates code and logic that we are trying to re-use across all of our generated classes.
The source code and the resulting class files might be indistinguishable from good old-fashioned cut-and-paste coding, but from the programmer's perspective the code can be treated as a single unit. When we test a generated class, we are testing all the classes, at least partially. When bugs are found, they can be fixed in the generator and the fix goes in across the whole system. Consider the benefit of a completely consistent generated subsystem. Naming is consistent. Error handling is consistent. Logging is consistent. When we understand how one generated class works, we understand all of them. Code generation let's you leverage code across a code base in ways you otherwise might not be able to.
Code generation as re-use
I find the code generation as re-use argument quite compelling. It's hard of think of a virtue that has been more revered by coders of every platform, system and language orientation than code reuse. From simple subroutines and modules to modern object-oriented techniques, the methods have varied, but the idea of writing code once and re-using it has been fundamental to both the art and science of coding.
We usually think of re-use in terms of language and runtime mechanisms. We put code in shared libraries to be used by any number of callers. We use inheritance to share code with subclasses and use interfaces to allow our classes to fit into and re-use existing infrastructures. Today's hot aspect-oriented programming (AOP) techniques are an attempt to extract orthogonal concerns from classes, allowing them to be written once and re-used.
Given our predisposition to strive for re-use in any form, it seems odd that many developers don't give much consideration to the value of code generation. Part of the problem is that code generation can look like a hack. If language and runtime systems provided re-use techniques that were deemed acceptable, then code generation would not be seen as a primitive pre-processing hack that we should be try to avoid.
There may be a bit of justification for that view. Code generation is often the most valuable in those dark, messy corners of the systems we are working with, in places where our language and tools seem inadequate. Take EJB development, for example. I don't want to bury (or praise) EJB here, but it's hard to deny that EJBs are messy and complicated. XDoclet is a code generation tool that has found its niche in relieving at least some of the pain associated with EJB development. But even here, XDoclet is often seen as a necessary evil in keeping the EJB monster at bay rather than a positive use of code generation.
Don't think that because the examples of code generation you see are often band-aids around inflexible or poorly designed systems that code generation is necessarily a sign of a poorly designed system. Code generation is a valid technique that provides a valuable type of re-use.
Code generation as high level language
I'll make another argument in favor of code generation as a relevant modern technique, that generating code is like writing code in a higher level language. Think back to our abstract model of code generation. Our template is expanded based on some high level data we feed into it, and to change the code, we merely change the input. In essence, we are writing code in a high level language compiled by the code generator.
Is Java necessarily the best language to express the concepts you need to express? What if a UML type model is the best high level way to express your business objects? Maybe a few JavaDoc comments on your class is a better way to express certain details of your system? Sometimes, the best way to program a system isn't with the language it is implemented in.
My first experience with source level code generation was with a parser generator called yacc (yet another compiler-compiler). Yacc takes as its input a grammar for a language: rules like "an assignment statement is a variable name followed by an equals sign and then an expression". From this high level grammar, it generates C code that implements a parser for that language.
Very few people would consider writing a parser by hand for anything but the simplest of expressions. We want to work at a more abstract level, but we don't want to lose the benefits of a static compiled implementation.
That's the power of code generation. You can work at a higher, more general level without having to leave your domain-specific language environment.
A code generation case study
I worked on a project that migrated a large business model from hand coded business objects to business objects generated from a UML model. There were strict requirements for security, persistence and extensibility. With the hand coded objects, each class worked slightly differently. As much as we tried to stick to well-defined conventions, each class looked slightly different. When changes needed to be made across the whole system, the developers had to apply the changes across the whole system. The potential for bugs was enormous and the developers spent a lot of time just maintaining the system and keeping the model, the code and the relational mappings in sync.
When a code generator was added to the system, everything changed. The UML model became king and all the logic for the various aspects of the system was coded once in the code generator. Writing the code generator took a lot of effort, and maintaining it was far from free. But the cost of maintaining the generator was much less than maintaining the entire hand coded business model.
Code generation was a huge win in this system. But were there other options? An AOP system could have provided consistent security, persistence, and logging layers across the entire system. A managed component system like EJB could have provided a similar set of layers: transactions, security, persistence, etc... Both of these are interesting options, but neither quite fit our needs the way code generation did.
A third possibility would have been to move to a completely dynamic abstract business object system. I call this the framework approach. Instead of implementing the business model as static Java objects, we could invoke generic methods for getting/setting properties and invoking business methods on your abstract objects. Our framework would figure out, based on the properties of your dynamic objects, how to apply all the layers of system in a consistent way.
The framework approach is surprisingly similar to code generation. All the decisions the code generator made at compile time about the behavior of the objects could be made in a similar way at runtime by the framework. And, the runtime framework could do many things not possible at compile time. But, compile time generation has it's own benefits too. Having source code pass through the watchful eyes of the compiler helps find problems faster, especially problems stemming from changes to the original input sources. Also, having the generated source code available can speed up debugging.
The goal isn't to say that code generation is necessarily better than the other possible approaches. In some cases it is, and in other cases it isn't. What I do want to say is that code generation should be considered on equal footing with these other techniques. Although code generation is often looked down upon as a dirty hack, it is really a powerful high level way to reuse code across a large system.
Many developers turn their noses up when code generation is mentioned, but given all the benefits of code generation it's hard to understand why. I strongly encourage using code generation wherever it is applicable on a project. The rewards are well worth the effort, and the occasional upturned nose.
About the Author
Norman Richards has ten years software development experience, and has worked with code generation for much of that time. He is an avid XDoclet user and evangelist. Norman lives in Austin, Texas. Norman is the co-author of XDoclet in Action.