I am working on a product that uses Hibernate that required DB (schema and data) migration, like most Enterprise Software.
Hibernate’s schema create/update feature (using
hibernate.hbm2ddl.auto property) has it’s limitations. Also, certain complex DB migration requirements will definitely require custom coding.
So, I had to set
hibernate.hbm2ddl.auto=validate due to the following known limitations with this Hibernate feature:
- Cannot drop a table or a column
- Cannot change a constraint on a column
- Cannot add a column with a not-null constraint to an existing table
The possible approaches/alternatives for DB migration that I could think-of were:
- Build a custom DB migration framework that uses separate SQL(DDL&DML) scripts and/or ETL scripts (Java/Ruby/Python/Groovy scripts) – flexible, efficient, could become unmanageable.
- ETLM – lot of work required, process decoupled from the application, custom ETLM process for every upgrade.
- Third party DB Upgrade/Change management tools:
The other design decision was to make the application “smart” so that it:
- Maintains the version and executed script info. so that the same old scripts are not executed again for every upgrade.
- Performs the necessary incremental upgrades based on the version number of the newly installed upgrade (read from the Jar manifest files) and the last installed version (read from the DB Version table).
- Requires no effort from the end-user of the application for upgrading the application.
I chose approach #1 listed above. The following section details the framework for Database Data and Schema Migration:
- This will require at least 2 new tables, “Version” and “MigrationScript” tables to hold the last upgraded version and the set of all migration scripts successfully executed so far.
- Before making any changes to the Schema or data for migration, the Database needs to be locked. It will be unlocked after all the migration changes are done.
- The main application Jars will have a Manifest entry that indicate the version of the Software that is installed. So, the application needs to migrate the Database from the version table in the database to the installed version and apply the necessary scripts.
- Designed File and Directory naming conventions for the script names:
- File extensions:
- SQL DDL –> .ddl
- SQL DML –> .dml
- Java/Ruby/Python/Groovy ETL Scripts –> .txt (This text file will have the fully qualified Java Class name or the name of the Ruby/Python/Groovy script to be executed. The scripts themselves are assumed to be on the classpath either directly or inside a Jar file).
- File and Directory Naming conventions: The directories (that hold the migration script names) are based on the Major versions of the Software and the target database. There will be a “common” directory for all database-agnostic scripts (typically simple DML SQLs). The Files in those directories are named using a 4-digit number prefix that will help sorting. So, the DB migration module in the application will execute the scripts after sorting the scripts in alphabetical order. The number prefix gives the flexibility to name the script whatever the developer decides (for example, 0001_create_tables_for_baseline.ddl). This provides the ability to mix all kinds of the above scripts (DDL, DML & ETLM scripts) which makes this approach very flexible and powerful.
- File extensions:
- All of the scripts are packaged in a Jar (or a Zip) file and deployed such that it is available in the classpath (as a resource).
- The DB migration module shall update the db version table with the new version/revision number and migration scripts table with all the successfully executed script names.
- The Java Class names specified in the Java ETLM script will have to implement a “Migrator” interface which will be called from within the DB migration framework. This is necessary for complex migrations.
- The framework will also need to take into consideration of the “Dev” builds vs “Production” builds. The “Dev” builds (typically executed from an Eclipse launcher) may not have the correct Manifest entries. So, the build version is set to ”-1″. The “Production”/QA builds will be run using ANT which will have the right Version info.
One of the major problems with this above approach is that the Developers will have to check-in the Scripts (if DB-specific) for all of the supported databases separately (in to separate directories). I created an ANT task that generates the DDL from a given hibernate mapping configuration. This serves as a starting point for new tables. On the flip side, this forces the developers to understand the tables that they are creating and also force them to make decisions. Also, testing migration scripts for several databases can be challenging, but, a continuous integration build setup for each DB should take care of this problem.
Next, I shall blog about the implementation to force Hibernate to use certain Database mappings and override the default Hibernate SQL type mappings.