If y’all are not sold on code generation yet, I have two great resources for you: one is Scott Hanselman’s talk at PDC 2005 (Code Generation - Architecting a New Kind of Reuse); another valuable resource is a book by Kathleen Dollard, Code Generation in Microsoft .NET.
For about a year now, on and off, I’ve been trying to find a way to cut down on code repetition. There’s so much you can refactor, and you’ll inevitably end up with code that is not the same, but is structured the same way—a perfect candidate for code generation.
Now, I don’t want to get into a discussion of codegen and ORM tools out there. This is not a post about CodeSmith vs LLBLGen (whoever came up with that kind of name?!), or anything like that.
I’ve tried to pick a codegen tool or find an approach that would best reflect model entities and their relationships within our product, but nothing fit the bill perfectly. Scott’s presentation finally filled in the blanks.
Next, I picked up a copy of Kathleen’s book (perfect timing!) and developed a codegen harness similar to the one she presents. For various reasons, I can’t publish my code or XSLT templates here, but I’ll list some key points from the book to get your juices flowing. First, though, let’s go over terminology.
Terminology
Metadata (in this context) is some kind of definition of your requirements. Generally, it’s an XML file which describes tables in your database, spells out column types in .NET and T-SQL terms, etc. Metadata should be pre-processed as much as possible, meaning you should make it as detailed as you can. For example, in my metadata file I list every column in a given table, it’s type in .NET and SQL Server, maximum length, whether it allows nulls; it’s singular and plural forms (handy to generate collections), etc. It so happens that my “requirements” are the database itself.
Once you collect (or “extract”) metadata, you feed it to a code generation tool. For this tool to be actually useful you need to develop templates. This tool uses the exact same metadata file you feed it and generates code for every available template. For example, I have templates produce a single class, a comparer and a collection class for it (e.g. User, UserComparer, UserCollection) as well as several other auxiliary classes.
A template determines what metadata you need; the metadata defines what the template can do. Do as much work as you can upfront for creating metadata friendly to your templates.
In a nutshell, my process follows the principles Kathleen has listed in her book:
Five Principles of Code Generation
- You have control of the templates that generate your code and can change or replace them as required.
- You collect metadata as a separate, distinct step with usable output that can independently evolve.
- You, or someone unfamiliar with the project, can regenerate your code precisely as a one-click process—now or at any point in the future.
- You embrace handcrafted code by isolating and protecting it. Code generation is a supporting player to human programming and doesn’t overwrite files unless they were generated and haven’t been edited.
- The code-generated application is a high-quality application. It allows more effective testing, has equal or better performance, and is more easily maintained than a similar fully handcrafted application.
Key Points
- Code generation is simply code that writes code.
- Code generation aids in the planning stage because it encourages iterative prototype-based requirements gathering. The code generation prototype is special because it’s capable of smoothly evolving into a robust permanent project element.
- Code generation needs a single monolithic metadata input.
- Metadata extraction takes you from some source to metadata usable for code generation.
- It’s important to make metadata extraction a distinct step you run and debug on its own. In other words, instead of going directly to the database when you need to know the columns in a table, retrieve it from an already-created XML file.
- You isolate the code you handcraft from the code built on templates so that regeneration doesn’t smash over your important handcrafted code. One of the key characteristics of code generation is respecting handcrafted code, and to do this you need to isolate and protect it.
- You’ll generally isolate class-specific code (i.e. handcrafted code) by placing it in derived classes.
- You derive a class containing your class-specific code from an autogenerated base class.
- Hold generated code to the same high-quality standards you use for the rest of your code, and treat it with the same care—including source control and testing.
- Your goal is one-click generation.
Conclusion
I know this is a lengthy topic, and I have to stop somewhere. I’m hoping I got at least some of you excited about code generation.
As to the book, I read only 5 chapters which are fascinating from the methodology standpoint. The rest of the book talks about the specifics of implementation of a homegrown codegen tool, and I simply didn’t want to go there. It has a nice refresher on XSLT, XML schemas, CodeDOM and other good stuff!