From the front lines
Broken Design
Generating code - a personal history

Generating Code

The early days

In the late 1980s I attended a course on a new database system. As usual, when I got back to work I decided to build an application to cement the new knowledge I had gained. I decided that a suitable project was to create a database to hold the configuration data for the TP Monitor. It would be useful to be able to draw non-standard reports. I was using Cobol, which was the standard application programming language on these computers at the time. I did some experiments, writing the code to load one or two tables. When that worked, I did the program. Unfortunately the thing failed after I had done the code to load all the tables. I had hit a resource limit in the database engine.

I reported the problem, but the fix would take too long to come through. I did some more experiments and found another way that avoided the problem. Unfortunately I would need to rewrite my whole program. Cobol is not very flexible. Life is too short, and I had other things to do. I vowed that this would not happen to me again.

Examining my program, I saw that it consisted of repetitions of very much the same pattern. To fix the program, I would have to replace these with a somewhat different pattern. Being a clever programmer, I decided to write a program that would produce the repeated code from a mould, with the specific adjustments for each case. This is a very old idea, being used in mail-merge programs, for example. The program could then be fixed by changing the mould and re-generating the program.

I did not like the idea of generating bits and pieces an then combining them. My program had to generate everything in one go. I designed a very simple language for my parameter file. This serves a similar function to XML. It was a text file so that I could easily edit it with an ordinary text file editor.

Once I had my generator, it turned out to be very much more useful than expected. Part of my job was to investigate and solve the poor performance of databases. My generator allowed me to quickly set up complete coverage of all the work being done by the database, then to zoom in on those parts that were causing the problems.

Later I got a consulting job to load data from a legacy Fortran system into a software package. The official opinion was that this was impossible: Cobol could not read Fortran data. This was nonsense. Fortran just arranged things differently. Fortunately I was able to use my client's documentation of the Fortran files to create my input parameters. Had not been available, I would have done the same from the Fortran programs themselves, albeit with more effort. The result was an enormous Cobol program that created Cobol versions of all the Fortran files. I managed to produce a 50,000 line Cobol program, that was completely bug free in a two month project.

This established the pattern for much of my later work. A combination of analyzing the structure of a system, together with my generator, allowed me to produce a large, high quality program efficiently. My aim always was that the resulting program should be one that I would be proud to have written conventionally. It should look good, and be easy for another programmer to take over without the generator.

Moving to personal computers

By the early 1990s I was an independent software consultant, developing client server and other software in a Microsoft Windows environment. One of the unfortunate things of being an independent consultant is that you sometimes have much too much time on your hands. I made use this enforced idleness to redevelop the generator for Windows. My first application was a program generator. It examined a database and created a program with edit screens for all the tables in the database. It was rather ugly, but it was an important proof of concept. I also created a powerful editor for defining and capturing parameter files, and for debugging the profiles.

Later projects

In the early days I tried to use my generator to automate the entire programming task, but I now believe that to be a fool's game. I now use it as scaffolding to help build a program. The scaffolding is removed when the building can stand by itself.

The way ahead seems to be with "fill in the blank" type systems from which a program gets built. The most important example of this is a data migration tool that allows the user to map the way data gets transformed when it is moved from one database to another. I am very slowly preparing the third version now.

One project was a huge and interesting challenge. I had to split a large and complex database into two. My client was creating a new international division and were obliged to maintain the client information separately. Given an impossible deadline, using the generator was the only possibility. The final SQL script was 5000 lines long which is enormous. I wrote a program that looked through the database for tables that were dependant on each other. Fortunately the designers had used a naming convention that made this possible. The program was a mixture of manual decisions and automation, and resulted in a parameter file representing the dependencies in the database. From this I could generate the script that loaded the data into the new database in the correct order, and removed it from the old database in the reverse order. Amazingly enough I got it right first time: later experience showed that I had not missed any data.

For me, this showed good management. Other than the deadline, I was in complete control of the project. I was not asked to justify my decisions, and there were no committees using up my time. Of course, had this gone wrong, my head would have been on the block.


The migration tools have been the most useful demonstrations of the potential of the generator. I am using the new version to integrate the generator into JamFram, my application framework. I need the generator in JamFram because some things will work best with 4GL techniques.


TP Monitor

A TP (Transaction Processing) Monitor is part of the operating software of a mainframe computer. Examples are CICS on IBM mainframes, and COMS on Unisys mainframes. It handles messages coming from outside the computer, say from end users. These are identified, perhaps pre-processed and passed on to the appropriate program. The reply, which could be an error message, is then passed back to the end user, again with possible reprocessing. The messages, routing, programs and reprocessing instructions form part of the TP Monitor's configuration.

These ideas have been reinvented on PC networks, with web servers and middle tier software, etc.


Cobol's flexibility

Cobol is a very archaic language, having first seen the light of day in the 1950s. It has been enhanced and extended over the years, but still retains much of its original character. One weakness compared to more modern languages, even older ones such as Fortran and assembler, is the inability to pass parameters to a subroutine. This meant that global variables had to be used to pass information to the subroutine, and that the subroutines were very concrete. Modern languages allow the programmer to write more general purpose code. The opportunities to re-use code in Java or C# are very much greater. In Cobol this was pretty well impossible.

I am told that more recent versions of Cobol are object oriented, and presumably do not have this problem. I have given up on the language as hopeless.

The initial motivation for writing a code generator was to deal with this inflexibility.



Mail-merge functionality is usually part of a word processor's functionality these days. In the old days it was in a stand alone program. A mail-merge program is given two inputs: a template document, and a parameter file. The template was marked up (to use the technical jargon) with place holders such as &firstname. The parameter file consisted of records with named fields, such as firstname. The mail-merge program would process each record, and replace the placeholders with the corresponding data value, in this case &firstname with John. A document is created, printed, or e-mailed for each parameter record.

Fourth Generation (4GL) programming tools do a similar thing. You can think of a 4GL as a specification capturing system. The program files, database definitions, TP Monitor configurations, etc are created from the specification when the system is generated. A 4GL system will have a number of templates which can be mail-merged to produce the desired code. This code may need to be combined into single source code files. For example, if the target language is Cobol, the data definitions and program code will need to be generated separately and combined into a single file because of the structure of a Cobol program.

The idea of generating code is itself much older. Experimental languages were often implemented by creating code in existing languages. This was most often Fortran in the early days. The first implementation of C++ was done in this way. This approach has huge advantages over producing machine code directly. For one thing, producing efficient object code is a very specialized undertaking, best left to experts. For another, code in a high level language can be read by more people, and hence debugged more readily. Of course, most of these systems produce really ugly code! Personally, I want my generated code to look like I had written it the old fashioned way

A disadvantage of generated code is the temptation for a programmer to change the code. Once a change has been made, the code cannot be re-generated as the change will be lost. Users of generated code need to be aware of this possibility. This is a system design issue.


Generating everything

This aim makes the programming task more difficult than those normally faced by application programmers, which is probably why program generators usually do not work this way. In these programs, the generation of the different pieces and their integration can be handled in an ad-hoc way. This would not matter as the generator is specific for the product and this is a one off thing. I wanted a general purpose generator, and I did not want to do these things every time I used the generator in a new way.

Consider a Cobol program. It has a data division. In this we define our files, and separately, the working storage. These will need to be generated separately. In the program division we will need to generate separate pieces of code for each part of the program, and another higher level piece that co-ordinates the execution of the lower level pieces.

The generator would have to work its way repeatedly through the parameter file and do repeated mail-merges at each level of the data. Which data is used depends on the context within the program. Things get more complicated when we want to do things conditionally, or different things depending on the data.

You get the idea.



My parameter language predates XML by some time, and if I were to do this again, I would probably use it. I have investigated changing my generator to use XML but that would entail more effort than I am prepared to invest. There are also performance issues. XML is a more complex language than my own and is slower to parse. I have optimized my parser to the extent that is is very fast. This is very important for the kind of applications I now have. My parameter files are also much more compact than XML, which is important to me.

I loose some flexibility, especially with free format text. There are ways around that problem, so I will live with it.

The profile (template or mould) could also be done in XML with advantage. Once again it would entail more work than I am prepared to invest.