From the front lines
Data Migration
The Solo Developer
On Sophistication

Data Migration

A common fault made when planning a new program to replace an old legacy program is to schedule the data migration as a short step at the end of the project.

This is an error for several reasons. In one way, the data migration is more complex than the new system: it has to deal with the complexity of both systems. If they are not very similar to each other, then the mapping of the data can be very difficult. Also, getting some data migrated as early as possible forces the developers to test against real data. This will highlight misunderstandings at a time when it will be easier to do something about it. It will also help to show the client data that is familiar. This is also an opportunity to clean up the old data. That can be very difficult.

With my own systems, the data gets migrated as soon as the new tables are identified. This allows me to schedule reports and processing tasks before the data capturing. This reverses the conventional approach. Cleaner data and new reporting might be of greater value to the business than data capture into an otherwise empty system. From the developer point of view, the ability to reproduce reports is an important validation of the new design.

The importance of data migration in my process has encouraged me to develop tools for this purpose. The main one is the DbConvert program. It analyzes the two databases, and allows the consultant to map the two by point and shoot. It handles many of the difficulties that arise. For example, it saves a mapping of the keys between data on the two systems; it allows data to be consolidated from many sources to a single destination, and conversely it allows data from a single source to be split to many destinations. The captured definition can be used to produce a specification document which is automatically kept in step with the migration.

In other words, DbConvert is a good 90% solution.