Hibernate and DB migration

I am working on a product that uses Hibernate that required DB (schema and data) migration, like most Enterprise Software.

Hibernate’s schema create/update feature (using hibernate.hbm2ddl.auto property) has it’s limitations. Also, certain complex DB migration requirements will definitely require custom coding.

So, I had to set hibernate.hbm2ddl.auto=validate due to the following known limitations with this Hibernate feature:

  1. Cannot drop a table or a column
  2. Cannot change a constraint on a column
  3. Cannot add a column with a not-null constraint to an existing table

The possible approaches/alternatives for DB migration that I could think-of were:

  1. Build a custom DB migration framework that uses separate SQL(DDL&DML) scripts and/or ETL scripts (Java/Ruby/Python/Groovy scripts) – flexible, efficient, could become unmanageable.
  2. ETLM – lot of work required, process decoupled from the application, custom ETLM process for every upgrade.
  3. Third party DB Upgrade/Change management tools:
    • Liquibase – XML based (Not very useful for most applications that do not maintain DB schema in XML)
    • AutoPatch
    • DBMigrate (Is this production-ready?)

The other design decision was to make the application “smart” so that it:

  • Maintains the version and executed script info. so that the same old scripts are not executed again for every upgrade.
  • Performs the necessary incremental upgrades based on the version number of the newly installed upgrade (read from the Jar manifest files) and the last installed version (read from the DB Version table).
  • Requires no effort from the end-user of the application for upgrading the application.

I chose approach #1 listed above. The following section details the framework for Database Data and Schema Migration:

  • This will require at least 2 new tables, “Version” and “MigrationScript” tables to hold the last upgraded version and the set of all migration scripts successfully executed so far.
  • Before making any changes to the Schema or data for migration, the Database needs to be locked. It will be unlocked after all the migration changes are done.
  • The main application Jars will have a Manifest entry that indicate the version of the Software that is installed. So, the application needs to migrate the Database from the version table in the database to the installed version and apply the necessary scripts.
  • Designed File and Directory naming conventions for the script names:
    • File extensions:
      • SQL DDL –> .ddl
      • SQL DML –> .dml
      • Java/Ruby/Python/Groovy ETL Scripts –> .txt (This text file will have the fully qualified Java Class name or the name of the Ruby/Python/Groovy script to be executed. The scripts themselves are assumed to be on the classpath either directly or inside a Jar file).
    • File and Directory Naming conventions: The directories (that hold the migration script names) are based on the Major versions of the Software and the target database. There will be a “common” directory for all database-agnostic scripts (typically simple DML SQLs). The Files in those directories are named using a 4-digit number prefix that will help sorting. So, the DB migration module in the application will execute the scripts after sorting the scripts in alphabetical order. The number prefix gives the flexibility to name the script whatever the developer decides (for example, 0001_create_tables_for_baseline.ddl). This provides the ability to mix all kinds of the above scripts (DDL, DML & ETLM scripts) which makes this approach very flexible and powerful.
  • All of the scripts are packaged in a Jar (or a Zip) file and deployed such that it is available in the classpath (as a resource).
  • The DB migration module shall update the db version table with the new version/revision number and migration scripts table with all the successfully executed script names.
  • The Java Class names specified in the Java ETLM script will have to implement a “Migrator” interface which will be called from within the DB migration framework. This is necessary for complex migrations.
  • The framework will also need to take into consideration of the “Dev” builds vs “Production” builds. The “Dev” builds (typically executed from an Eclipse launcher) may not have the correct Manifest entries. So, the build version is set to ”-1″. The “Production”/QA builds will be run using ANT which will have the right Version info.

One of the major problems with this above approach is that the Developers will have to check-in the Scripts (if DB-specific) for all of the supported databases separately (in to separate directories). I created an ANT task that generates the DDL from a given hibernate mapping configuration. This serves as a starting point for new tables. On the flip side, this forces the developers to understand the tables that they are creating and also force them to make decisions. Also, testing migration scripts for several databases can be challenging, but, a continuous integration build setup for each DB should take care of this problem.

Next, I shall blog about the implementation to force Hibernate to use certain Database mappings and override the default Hibernate SQL type mappings.

Technorati Tags: , , ,

About Shashi Velur

I am Shashi Velur. I live in Austin, TX with my wife and daughter. I consider myself an experienced Enterprise Software Developer/Architect. The views expressed are mine alone and do not necessarily represent the views of my current and any of my previous employers.
This entry was posted in Hibernate, Java and tagged , , , . Bookmark the permalink.

9 Responses to Hibernate and DB migration

  1. Hi Shashi,
    This is a great writeup of the problem. I’m curious if you did get your proposed solution working? Did you find any other resources that helped?

    My organization has similar problems, but we use hand-coded SQL migration scripts run by an operator to handle the migrations. I like how your solution has the opportunity to empower the developers to test the process.

    thanks,
    Jason

  2. admin says:

    Hi Jason:

    Thank you.

    I was able to get this solution working in a couple of products at my current job. One of them is a large web application and the other is a dual-API SDK. I think it has been working okay so far.

    There were not many online resources that helped me. However, some of this was inspired by:
    http://code.google.com/p/dbmigrate/

    Some of the issues we faced while implementing the above solution are:
    1. Whenever DDL and DML changes migration scripts are to be executed for a data/schema migration, then we had to make sure that DDL changes were executed outside the transaction while DML changes were executed in one transaction. MS SQLServer presented us with some wierd issues here.

    2. Migration scripts need to be tested really well before incorporating them in to the product, as the rollback due to a failure is impossible when DDL changes are involved. And, adding compensatory changes for rollback is complicated.

    3. After several minor releases (we follow Agile, so there could be a “minor” release almost every Sprint), you may end-up with a lot of migration scripts. After some “minor” releases, we rolled-up all of the DDL migration changes in to the base DDL script after confirming with the product owner about the customer migration paths. This helped clean-up a lot of old migration scripts.

    4. It would be nice if all the migration scripts are idempotent. This may be hard to achieve.

    Good luck to you..!

    Cheers,
    Shashi

  3. Itamar says:

    Hi there.
    Theres a third party tool for SQL Version Control :
    http://www.nobhillsoft.com/Randolph.aspx

    What do you think of it ?

  4. Petr Gladkikh says:

    As far as I understand LiquidBase requires you to maintain XML for changesets only (initial DB state can be accepted as granted). So I do not see how your approach would be simpler comparable to that.

    As a side note I have implemented procedure similar to yours for an PHP project (update scrips were applied as part of RPM package update procedure). If difference in schema versions were more than one patch, then several patches were applied in turn. The results were satisfying, however the number of schema changes was quite low.

  5. admin says:

    Hi Itamar,

    “Randolph” looks very interesting. I like the fact that it is decoupled from the application and stores the history of DB changes. It seems very useful for tracking the changes to the DB schema over a period of time.

    Does it help in “data migration” as well?
    One of the migration requirements involved some application logic that could not be handled through just SQL. Does this tool provide a way to use any other scripting language to handle such complicated data migrations?

    Cheers,
    Shashi

  6. admin says:

    Hi Petr,

    If the number of schema changes are simple and infrequent, then the approach I used here might be an overkill. So, I agree with you.

    However, if you want a robust tool to support non-trivial data and schema migrations that involves “application logic” to script complex “transformations”, it might be very difficult to handle that with just XML changesets.
    There is where the power of scripting support becomes very handy.

    Cheers,
    Shashi

  7. Itamar says:

    Shashi, sorry for the late replay.. ;)

    Randolph keeps track of all the schema changes. It can also keep track of data changes, but that significantly slows the process down, as you can imagine. Some people do use the data tracking, I think, but mostly for the smaller tables – usually ‘reference’ or ‘configuration’ tables. They do work on improving that feature, instead of timely scans of changes have an ‘Audit Trail’ capability (it sets up triggers on tables who’s data you want to keep) but that’s not out yet.

    As for support for scripting language: When you wish to migrate, it generates SQL script that does all the migration. and you can plug in there any script you like. or maybe I don’t understand your question?

  8. admin says:

    Itamar, thanks for the follow-up.

    We had some requirements to migrate rows in some old data (non-reference/config) tables across Oracle, SQL Server & MySQL databases. This meant that we had to run the old data through a complex “Computation Engine” built within the application that has a lot of business logic. So, it was almost impossible to do this in SQL Scripts alone. Instead of re-writing this logic in migration scripts, we called the actual Java application code through the migration script (Ruby using JRuby or Python using JPython, etc.).

    The downside of this approach is that the migration code has to have access to the application code. The obvious advantages are that you do not have duplicate this logic outside the application and also need not have to duplicate this logic across various supported databases.

    Hope that helps.

    Cheers,
    Shashi

  9. Chris says:

    There is a tool called ChronicDB that can migrate data during schema changes lock-free.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>