Sunday, October 16, 2016

Migrate Grails 2.2.1 application data from Oracle to PostgreSQL (Part-2 of 2) . . .

Phase-1 Database Migration

Part -2: Migrate Data from Oracle to PostgreSQL

In Part-1, I covered migrating Grails Application from Oracle to PostgreSQL.

In this Part-2, I cover Data Migration to conclude this two-part series. This is not a comprehensive post by any means. I only cover what I experienced in this process.

After migrating Grails application, my next step was to migrate data. I knew this is where things would become bit tricky. I always feel that Software development is interesting with many competing technologies and products, but is made messy with only few similarities but many differences. Competing technologies start to emerge with little or no standards and deviate so much. Databases are no exception.

During application development, the debate of application logic getting buried in Stored Procedures as opposed to opening up in the application's business/service layer gets to the question of "What if we change the database?". The answer, "We won't change our database" often wins the debate and takes the roots of business-logic deeper and deeper into the database. Lately, PostgreSQL has been making many to rethink of investing in expensive databases. Change is inevitable. The time to "change your database" has come- to some, at least!

My first effort

My first effort was to turn to Google and find any tool(s) that could help migrate data from Oracle to PostgreSQL. After checking several commercial tools, an open-source tool (Ora2Pg), and trying out one tool (Navicat), it was an easy decision for me: decided not to use any tool.

Most of the tools migrate both schema & data, often resulting with several datatype mismatches between Grails application's domain-model and Database schema, especially with fields like boolean that has a close matching database equivalent boolean type in PostgreSQL. It's not easy for the tools to infer these details through database schema/metadata.

The Approach

I took the well-known approach in Software Engineering: ETL - a 3-Step process of migrating data.
  1. Extraction- I leveraged SQL Developer tool to extract data out into delimited files.
  2. Transformation- I wrote Groovy scripts to transform data by transforming data of only those data types that needed special care.
  3. Load- I wrote groovy scripts to load transformed data files into PostgreSQL using its COPY command which is very cool for this task.

Data Migration Step-by-Step

Step-0: Create PostgreSQL database and generate schema

Create PostgreSQL database and generate schema by leveraging Grails and dbm-plugin. I covered this in Part-1.

Step-1: Extract data from Oracle

There are many possible ways to extract/export data from Oracle. I used SQL Developer tool to extract data into delimited multiple files. Each file contains exported data of a specific table.

Certain data types can be transformed during data export by setting preferences. Default Date format is one such that differs form database to database. With this ETL approach, dates certainly need to go through transformations due to default-format differences between databases. This transformation can easily be avoided by taking it into this step of Extraction/Export. SQL Developer offers to set data preferences for a session. With this feature, we can set preferences for Date/Timestamp fields and change the format to default PostgreSQL date format.

Set Data Preferences

Open SQL Developer and connect to database. To check current preferences, go to Preferences > Database > NLS for formats from Oracle SQL Developer main menu item.

Execute the following SQL to alter date format for that particular session:

alter SESSION set NLS_DATE_FORMAT = 'MM/DD/YYYY HH24:MI:SS';
alter SESSION set NLS_TIMESTAMP_FORMAT = 'MM/DD/YYYY HH24:MI:SSXFF';

Export Data

Go to Tools > Database Export and follow the following steps:
Select Connection
Uncheck Export DDL
Format: delimited
Delimiter: % (your preferred char, I chose % as many of special characters are found in the actual data in some column or other)
Left Enclosure: "
Right Enclosure: "
Save as: Separate Files
File: enter dir name (where the exported data to be saved)

Simply go through next steps to finish. You will have data exported into multiple files in the directory specified with each file name as <TABLE_NAME>_DATA_TABLE.dsv.  For instance if you have a table MY_TABLE, it's exported delimited data file would be: MY_TABLE_DATA_TABLE.dsv.

Step-2: Transform Data

This step requires, finding all data types that need data transformation. Once data type Date is taken care during the Extraction process, I was only left to deal with Boolean/boolean data types, transforming 1/0 to true/false. This requires identifying all such data columns in the entire schema. There are two choices for finding those columns: 1) Database metadata, 2) Grails domain model. I chose Grails domain-model and wrote a simple script to find the list of tables and columns that are mapped to Boolean/boolean properties of domain-model. I had covered this in my previous post: The power of Grails Metadata is there when you need it. Once I found those details, I got everything that I needed to transform data.

I wrote Groovy script for transforming data. Following is the Groovy script:

import groovy.json.JsonSlurper /** * oracle-to-postgres-data-mIgrator.groovy * This groovy script migrates/transforms delimited data-files exported from Oracle to data-files that are PostgreSQL * compliant. It takes care of special data migration needed for data-column types like boolean, and nullable FK columns * that get outputted as "" for nulls when left enclosure(") and right enclosure (") characters are used during data * export. * * Certain dependencies like: base directory the data-files are located in, data-file extension etc. are externalized * into postgres-migration-config.groovy file for making this script little flexible. * * Prerequisites * 1. Run script: boolean-columns-finder.groovy from the application's home dir. * 2. Export all data from Oracle database into delimited files from a tool like: SqlDeveloper. * 3. Adjust any configurations needed in postgres-data-migration-config.groovy file. * Especially config properties like: delimiter and baseDir * * Dependencies (input) * 1. Requires myAppTablesBooleanColumns.json to exist in the application home directory. This is the file that is * result of executing boolean-columns-finder.groovy that contains a map of table names and corresponding list of * boolean columns. * 2. A set of delimited data files exported using a tool like SqlDeveloper. * * Assumptions for Data Export (based on the tool used SqlDeveloper to export oracle data). * . Each table's data is exported into it's own data file with the filename like: _DATA_TABLE.dsv * . To simplify parsing, used % as the delimiter because it's one unused character found in application's data. * . Each data file contains header as the first line which is nothing but column names in capital letters. * . Used left enclosure(") and right enclosure (") characters during data export which encloses column-data like * long text that contains embedded new lines, and nullable FK's that are null. * . Also, default date format is different between oracle and PostgreSQL. The data migration date columns has been * eliminated by leveraging SqlDeveloper to export data in the format that PostgreSQL can import. * This can be achieved by running the following SQLs for the session in SqlDeveloper before data export. * alter SESSION set NLS_DATE_FORMAT = 'MM/DD/YYYY HH24:MI:SS'; * alter SESSION set NLS_TIMESTAMP_FORMAT = 'MM/DD/YYYY HH24:MI:SSXFF'; * * Result (output) * 1. Migrated data files. Each data file (_DATA_TABLE.dsv) will have a corresponding migrated data file * (_DATA_TABLE.dsv.migrated) * 2. Only data that needs to be migrated (for boolean data) goes through data migration. * The input json file: myAppTablesBooleanColumns.json contains all tables that need data migration. For every * table that needs data migration, the corresponding input data-file name is derived by the above mentioned * filename assumptions. * e.g. table: my_table, data-file: MY_TABLE_DATA_TABLE.dsv, migrated data-file: MY_TABLE_DATA_TABLE.dsv.migrated * 3. Rest of the tables' data that do not need data migration (do not have boolean columns) is a simple passthrough * with just the header migration. The header migration is simply converting column names to lowercase. * * @see postgres-data-migration-config.groovy * @see boolean-columns-finder.groovy * * @author Giri Pottepalem */ def config = new ConfigSlurper().parse(new File('postgres-data-migration-config.groovy').toURI().toURL()) def dataDelimiter = config.data.delimiter def datafileBaseDir = config.data.file.baseDir def datafileExtension = config.data.file.extension def migratedDatafileExtension = config.data.migrated.file.extension def tablesBooleanColumnsFile = config.tables.boolean.columns.file if(!new File(tablesBooleanColumnsFile).exists()){ println "Error: Missing required file ${tablesBooleanColumnsFile} for data migration." System.exit(1) } Map tablesBooleanColumnsMap = new JsonSlurper().parseText( new FileReader(tablesBooleanColumnsFile).text ) def dataFilesNeedDataMigraton = tablesBooleanColumnsMap.keySet().collect { "${it.toUpperCase()}${datafileExtension}" } def allDataFiles = new File(datafileBaseDir).listFiles().name.findAll { it.endsWith(datafileExtension) } List dataFilesJustNeedHeaderMigration = allDataFiles - dataFilesNeedDataMigraton println "Migrating just header for all data files that don't need data migration..." dataFilesJustNeedHeaderMigration.each { dataFilename -> String absoluteDataFilename = "${datafileBaseDir}/$dataFilename" print "Reading data file: $absoluteDataFilename..." File dataFile = new File(absoluteDataFilename) File migratedDataFile = new File("${absoluteDataFilename}${migratedDatafileExtension}") migratedDataFile.withWriter { fileWriter -> dataFile.eachLine { line, lineNumber -> if (lineNumber == 1) { fileWriter.writeLine(line.toLowerCase()) //migrate header } else { fileWriter.writeLine(line) //take data as is } } } println "Migrated" } println "Migrating data..." tablesBooleanColumnsMap.each { String tableName, List booleanColumns -> String dataFilename = "${datafileBaseDir}/${tableName.toUpperCase()}${datafileExtension}" print "Reading data file: $dataFilename..." File dataFile = new File(dataFilename) File migratedDataFile = new File("${dataFilename}${migratedDatafileExtension}") String header def columnNames migratedDataFile.withWriter { fileWriter -> dataFile.eachLine { String line, int lineNumber -> if (lineNumber == 1) { header = line.toLowerCase() //migrate header columnNames = header.split(dataDelimiter) fileWriter.writeLine(header) } else { //migrate data String[] columnData = line.split(dataDelimiter, -1) def booleanColumnIndexes = booleanColumns.collect { columnName -> columnNames.findIndexOf { it.equalsIgnoreCase(columnName) } } booleanColumnIndexes.each { booleanColumnIndex -> if (booleanColumnIndex < columnData.length) { switch (columnData[booleanColumnIndex]) { case null: columnData[booleanColumnIndex] = ''; break case '1': columnData[booleanColumnIndex] = 't'; break case '0': columnData[booleanColumnIndex] = 'f'; break } } } //update data that ended up as "" due to blank to '' in data exoport columnData.eachWithIndex { data, i -> if (data == '""') { columnData[i] = '' } } fileWriter.writeLine(columnData.join(dataDelimiter)) } } } println "Migrated" } println "Done"

Following is a sample of externalized configuration properties as Groovy script in order to make the above script bit flexible:
/** * postgres-data-migration-config.groovy * Configuration properties as groovy script * @author Giri Pottepalem */ data { delimiter = '%' file { baseDir = './export' //dir in which delimited data files exist extension = '_DATA_TABLE.dsv' } migrated { file { extension = '.migrated' } //need to be loaded in specific order due to foreign key constraints files = [ 'MY_TABLE1_DATA_TABLE.dsv.migrated', 'MY_TABLE2_DATA_TABLE.dsv.migrated', ... ] } } appHome = '/gateway' //myApp application home dir //file that contains list of tables and boolean columns for each table. Result of running boolean-columns-finder.groovy tables.boolean.columns.file = "${appHome}/myAppTablesBooleanColumns.json" postgres { dataSource { url = 'jdbc:postgresql://localhost:5432/myApp' user = 'postgres' //superuser for running COPY password = 'password' driverClassName = 'org.postgresql.Driver' } //Adjust sequences with values taken from oracle db by: select sequence_name, last_number from user_sequences; sequences = [ 'hibernate_sequence': 150461, 'table1_sequence': 4041 ] }

Step-3: Load Data

Once data is migrated, find out data dependencies order and load data in the required order that satisfies foreign-key constraints. I covered this in my previous post: Groovy Script to load CSV data files into PostgreSQL database onto Amazon RDS.

References



Sunday, October 09, 2016

The Power of Grails Metadata is there when you need it . . .

Grails underpins the frameworks and technologies like Spring and Hibernate it was built on top of. With the concept of Convention-over-ConfigurationSensible defaults, GORM and many Architectural & Design decisions and solutions to known issues taken into the framework, it multiplies developer's productivity which can otherwise only be imaginable.

In Java space, as frameworks keep getting better and better over years, they take developer to higher levels of coding with no need to deal with low-level details. It's only a myth and not a fact if one argues "frameworks hide details". A good framework does not do that and gives full access to underlying technologies. Grails is one of such.

Recently I had a need to find out the complete list of database Tables and Columns that are mapped to Boolean/boolean domain object properties in a Grails 2.2.1 project. There are two possible ways this can be achieved in a Grails project. 1) By examining metadata of the database 2) By examining metadata of the Grails application execution environment itself. The second option: Grails way, seemed more reliable to me. I started looking into Grails API and could get the details pretty easily from the metadata it exposes.

Following is the code-snippet with parts of interest highlighted:

/** * boolean-columns-finder.groovy * Grails groovy script which finds persistable Boolean/boolean properties of all domain * objects and finds corresponding database mappings of columns and tables. Generates a * JSON file that contains list of Boolean/boolean column-names for each table-name * mapped to a domain object that contains Boolean/boolean properties. * * @author Giri Pottepalem */ import groovy.json.JsonBuilder def sessionFactory = ctx.getBean("sessionFactory") Map tablesBooleanColumnsMap = [:] //get table names and column names of all domain objects that have Boolean/boolean properties grailsApplication.getArtefacts("Domain").each { domainObject -> def dbTableName = sessionFactory.getClassMetadata(domainObject.clazz).tableName def booleanProperties = domainObject.properties.findAll { it.isPersistent() && it.type.name.contains('oolean') }*.name if (booleanProperties) { def dbBooleanColumnNames = booleanProperties.collect { sessionFactory .getClassMetadata(domainObject.clazz) .propertyMapping .getColumnNames(it) }.flatten() tablesBooleanColumnsMap << [(dbTableName): dbBooleanColumnNames] } } //write the map built to a file as JSON def jsonFile = "${grails.util.Metadata.current.getApplicationName()}TablesBooleanColumns.json" new File(jsonFile).write( new JsonBuilder(tablesBooleanColumnsMap).toPrettyString() ) println "Successfully extracted mappings for Boolean/boolean properties found" + " and created a JSON file: $jsonFile"

Note

Grails makes the main interface to a running application available through grailsApplication variable and it's context through ctx variable. This script can be run from grails console and also from the command line.

To run from command line, run it from the project home directory (assuming that the script is located under src/groovy dir of the grails project) like:
grails run-script src/groovy/boolean-columns-finder.groovy

Summary

Grails puts you on the power steering seat. It only underpins all of the technologies & frameworks it was built on. It does not hide any details from you. When you really need low-level details, you have all available to you.

References

Friday, October 07, 2016

Groovy script to load CSV data files into PostgreSQL database onto Amazon RDS . . .

Recently, I worked on a task to migrate a Java 6 Grails 2.2.1 application from Oracle to PostgreSQL. In my recent post (Part-1 of 2), I covered details on migrating the Grails application as such. I am yet to cover details on data migration in my next part. However, in this post, I just wanted to share some of my findings and learnings on loading migrated data onto PostgreSQL database hosted on Amazon RDS.

As always, for any problem, there are multiple solutions. After quickly looking at a couple of commercial tools and an open source tool that offer migrating database from Oracle to PostgreSQL, and trying out trail-version of a commercial tool, I decided not to use any such tool for data migration. Many of these tools simply migrate schema, and attempt to migrate data & other database objects to the closest possible match of the target database. It often results with a fully migrated database, but leaving you to deal with the problem of object-relational impedance mismatch.

Since, I already leveraged Grails database-migration plugin to create new schema, constraints and sequences for PostgreSQL, all I needed was just to get data migrated and loaded into newly created schema. I took the well-known approach in Software Engineering: ETL which is a 3 step process of migrating data. For step 1) Extraction- I leveraged SQL Developer tool to extract data out into delimited files. For step 2) Transformation- I wrote Groovy scripts to transform data by taking care of data of those data types that needed data transformation. For step 3) Load- I wrote groovy scripts to load transformed data files into PostgreSQL using its COPY command which is very cool for this task.

All looked great with my local PostgreSQL database. I was able to script out both the tasks of data transformation and loading. But when I ran my loader script connecting to Amazon RDS hosted PostgreSQL database to load transformed data into, the COPY command simply failed with a misleading message saying "must be superuser for running COPY".

After some investigation and some reading of Amazon docs, I came to know that Amazon RDS doesn't allow PostgreSQL COPY command for security reasons. However, they do allow other \COPY PostgreSQL command, but it requires to be run through psql interactive command-line tool that comes with PostgreSQL comes. After playing with \COPY command in psql, I was able to manually load one migrated data -ile into a specific database table on Amazon RDS. Cool!

My next step was to take it into my loader Groovy script. As Java allows to execute an external program/process concurrently from Java application and Groovy enhances the java.lang.Process class further, it was easy executing psql tool from my script and getting all migrated data files loaded into Amazon RDS hosted PostgreSQL database.

Following is the snippet of Groovy script code with code highlighted that is of particular interest in this post (certain details are omitted for brevity):
@GrabResolver(name='postgres', root='https://mvnrepository.com/artifact/org.postgresql/postgresql') @GrabConfig(systemClassLoader=true) @Grab(group='org.postgresql', module='postgresql', version='9.4.1211.jre6') import groovy.sql.Sql //read config groovy file that contains all necessary configurations //including data-source, list of migrated files in the order to be loaded etc. def config = new ConfigSlurper().parse( new File('postgres-data-migration-config.groovy').toURI().toURL() ) ... def sql = Sql.newInstance( dataSource.url, dataSource.user, dataSource.password, dataSource.driverClassName ) boolean isLoadingToAmazonAws = dataSource.url.contains('amazonaws.com') //load each migrated data file into it's table dataFiles.each { String dataFileName -> println "loading data: $dataFileName" String header = new File("${datafileBaseDir}/${dataFileName}").withReader { return it.readLine() } //transform header: "h1"%"h2"%"h3" into: h1,h2,h3 to form //column names needed for COPY or \COPY command String columnNames = header.replaceAll(dataDelimiter, ',').replaceAll('"', '') ... if(isLoadingToAmazonAws) { //load data onto amazon aws database //build psql process command with required arguments to invoke it as an external process from this script def copyCommand = [ 'psql', "--host=${parsedDataSourceUrl.host}", "--port=${parsedDataSourceUrl.port}", "--dbname=${parsedDataSourceUrl.dbname}", "--username=${dataSource.user}", "--no-password", //suppress password prompt "--command=\\COPY ${tableName}(${columnNames}) FROM '${datafileBaseDir}/${dataFileName}' DELIMITER '${dataDelimiter}' CSV HEADER" ] println copyCommand.join(' ') //this is how it would look like if it was run from command-line //run psql tool as an external process def process = copyCommand.execute() def out = new StringBuffer() def err = new StringBuffer() process.waitForProcessOutput(out, err) //wait for the process to finish println "exit code: ${process.exitValue()}" if (out.size()) { println "Success: $out" } if (err.size()) { println "Error: $err" } } else { //load data to a local database by simply running COPY command as an sql statement String copyStatement = """ COPY ${tableName}(${columnNames}) FROM '${datafileBaseDir}/${dataFileName}' DELIMITER '${dataDelimiter}' CSV HEADER """ println copyStatement boolean status = sql.execute(copyStatement) println status ? "Error loading..." : "Successfully loaded..." } } println "Altering sequence values..." postgresSequences.each { sequenceName, value -> String alterStatement = "ALTER SEQUENCE $sequenceName RESTART WITH $value" println alterStatement boolean status = sql.execute(alterStatement) println status ? "Error altering sequence $sequenceName" : "Successfully altered sequence: $sequenceName to start with: $value" } println "Done"

A sample psql \COPY command for running from a terminal command prompt looks like:
psql \ --host=myhost.rds.amazonaws.com --port=5432 --dbname=my-db \ --username=my-user --no-password \ --command=\COPY my_table(col1, col2) FROM '/my-file.dsv.migrated' DELIMITER ',' CSV HEADER

A sample result of the above psql execution that also gives number of records loaded looks like:
exit code: 0 Success: COPY 43628

Note

There is no password used for executing the \COPY command through psql tool. The password can be stored in ~/.pgpass a file under user home dir or in any other file referenced by env variable PGPASSFILE in the format: hostname:port:database:username:password

References

Sunday, October 02, 2016

Migrate Grails 2.2.1 application from Oracle to PostgreSQL (Part-1 of 2) . . .

Lately, I had to take up the task of migrating one of our Grails 2.2.1 applications. The migration was two folded: 1) From Oracle to PostgreSQL Database and 2) From Grails 2.2.1 to newer 3.x version.

We took a two-phase approach by attempting the Database migration first. In this first-part of two-part posts, I will briefly cover the kind of changes I had to make for migrating the application from Oracle to PostgreSQL.

Phase-1 Database Migration

Part -1: Migrate Grails Application from Oracle to PostgreSQL

The application was a straight forward Java 6, Grails 2.2.1 application with database-migration plugin used for tracking database changes that the application went through during development and afterwards.

My first-decision
There were about a dozen database-migration change scripts in the application. It was a no brainer for me to decide not to look into those history of changelogs. Who knows what kind of database changes that the app had gone through over time and  there is no point in migrating all that history. I decided to regenerate base changelog script that gets me started with the current state of the model as the base.

Following are the steps:

Step-1 Change dependencies - remove Oracle and add PostgreSQL
BuildConfig.groovy
dependencies { ... //runtime 'oracle:ojdbc6:11.2.0.1.0' runtime 'org.postgresql:postgresql:9.4.1211.jre6' //Java 6 PostgreSQL driver ... }

Step-2 Change Datasource
DataSource.groovy
environments { dataSource { pooled = true driverClassName = "org.postgresql.Driver" dialect = "org.hibernate.dialect.PostgreSQLDialect" configClass = HibernateFilterDomainConfiguration.class } ... development { dataSource { dbCreate = "none" //one of 'create', 'create-drop', 'update', 'none' driverClassName = "org.postgresql.Driver" dialect = "org.hibernate.dialect.PostgreSQLDialect" url = "jdbc:postgresql://localhost:5432/myapp" username = "myapp" password = "myapp" } } ... }

Make sure that you have PostgreSQL db installed, myapp database created and accessible by user myapp

Step 3 Regenerate Database changelog and get new schema created
Run the following command from the application home dir to generate baseline changelog:
grails dev dbm-generate-gorm-changelog baseline.groovy
This will generate baseline.groovy file under grails-app/migrations

Step 4 Make necessary changes to take the generated file into the app
Depending on how previous Oracle database migration changes were organized, take the baseline.groovy as the new base changelog file into the application.

I took this as the base changelog by creating the main changelog.groovy file that the plug-in looks for under grails-app/migrations dir. I also created releases/postgresql sub-dir under grails-app/migrations dir and moved the generated baseline.groovy file there. Modified changelog.groovy to reference it as shown below:
changelog.groovy
databaseChangeLog = { include file: 'releases/postgresql/baseline.groovy' }

Step 5 Get database objects created
Run the following database-migration command from the application home directory to get database objects like tables, constraints, sequences etc. created in the database from baseline.groovy changelog. With this your database schema should be is inline with grails domain model.
grails -Dgrails.env=development dbm-update

Step 6 Run the application
Run the app and see if it comes up. If it doesn't, there could be some issues with data type differences between Oracle and PostgreSQL. Check Bootstrap.groovy and see if you have any data initialization that's causing any issues with data types. Also, setting root log-level to debug in Config.groovy will be helpful.

Step 5 Code changes (if any needed)
Make necessary code changes. In our case, we had Grails Hibernate filters plugin used in the app and
boolean/Boolean type was the only data type that needed a special care. With Oracle, it was mapped to database column type: NUMBER. PostgreSQL provides standard sql boolean type. I had to change some of the filters that used boolean properties of domain objects. The following is an example of one such before and after migration change:

//before change static hibernateFilters = { active(condition: 'deleted=0', default:true) } //after change static hibernateFilters = { active(condition: "deleted='false'", default:true) }

Once I made the above code changes for boolean/Boolean data types, the application worked like a charm. I quickly poked around the application to see if anything breaks. Everything seemed working. Also, the application had a bunch of test cases written and all test cases passed.

Step 6 Cleanup (if any needed)
Once everything looks good, with the approach taken not to migrate old database migration scripts and newly regenerated baseline changelog, the old change logs can be safely deleted and also oracle driver if it was added to lib folder can go away.

That was not bad at all ;)

In my next part (Part-2), I will write about Data Migration from Oracle to PostgreSQL which will conclude Phase-1: Database Migration from Oracle to PostgreSQL.