Monday, January 24th, 2011

Migrating HR data

During the last year I was involved in a project that had to migrate HR data from one system (SAP) to another (PeopleSoft) because of a merger. In many data migration projects where PeopleSoft is involved, they end up putting data into the system using Component Interfaces. And to be honest, it’s the only way to go. Unless of course you want to rebuild all the business logic into something else.

But that usually also means that literally all the validations and defaults are put into Application Engines to process the raw data. During this project we made the decision not to do that, yet split the load into two groups; first putting the raw data into PeopleSoft-ready data. Second, putting the transformed data through Component Interfaces.

The rest of this post I would like to put the following to your attention:

  • Use of Informatica Powercenter as ETL tool
  • Use of Application Engines and Component Interfaces to load data
  • Extended use of Application Engines to start off sequential processes and split loads into groups that run simultaneously

 

Raw data

First the data from SAP was delivered through csv-files, complete with SAP codes for things like departments, salary grades and absence codes. This was mapped to PeopleSoft values and put into a list (also a csv-file) per subject. All the small lists, mostly translate values, were put into one list with the fieldname as first value. This gave the ability to put data from one value to another. Question was, how to make sure those mappings were going to be put to good use?

A staging area was constructed where copies of all the targeted PeopleSoft data tables were put in. Also tables were made where the mapping files could be loaded in. This gave the following (fictive) construction:

 

This was done in Informatica Powercenter. First mappings and workflows were constructed to fill the mapping tables. That way everything was in place to transform the actual HR data from a SAP format, to a PeopleSoft format. As with PeopleSoft, also Powercenter has a logical build up of its components:

  • Source definitions
    • Flat file
    • XML
    • Database
  • Target definitions
    • Flat file
    • XML
    • Database
  • Transformation definitions
    • Lookups
    • Expressions
    • And many more (above are most used in this topic)

The above three definitions allow the construction of another component called Mapping. This together forms the basic definition of what you want to do with your data. Then you need to define how you want your data to be handled. This is done in a Workflow. This component consists of the following subparts:

  • Task
  • Session
  • Worklet

Out of these subparts, a session is connected to a mapping. And a workflow is a possible combination of sessions compiled into a group. A workflow is a mix of either sessions, worklets or both.

After transforming the data through Powercenter, a filled staging area is present for PeopleSoft to process. Per functional area an Application Engine was constructed to feed one or more Component Interfaces. Results, good and bad, were logged into a database table. This allowed for real time reporting on the migration of the data, elapsed times per functional area and fallout percentages based on error groups.

 

Finally the matter of performance. The initial aim was to load the data during the span of a weekend. During the first tests this proved to be a difficult task. After several performance testing, some time was won by making database changes and assigning more resources to the used environments. But these options soon ran out of possibilities. Speeding up Component Interfaces is hard, if not impossible. This is because of all the logic that needs to be processed and the main way of gaining time in this area is perfecting the performance on your search records.

This however was not enough to get us close to the two day marker. We then thought of a way to automate more and more in our processing steps. By making an Application Engine solely to put all the run controls in with the right values, deleting any old or relevant data and checking all the needed links to other applications was a first big gain in time. Now the only thing that was needed to be done manually, was starting the Application Engine and checking the logging afterwards. All the steps integrated into the Application Engine fell off the to-do list.

But the biggest gain in time was splitting the bigger functional areas, like Personal Data and Job Data, into several groups that ran simultaneously. Having the Process Schedulers and Application Servers process one load of 8000 or four loads of 2000 made a huge difference. Splitting it into more loads proved to give minimal extra performance, so we left it at four identical streams. In order to get this to work, another Application Engine was constructed to pick up the run control values and start off a given amount of streams equally divided over all the available Process Schedulers.

The final result of this project was a total migration with a fallout of less than 0,5% for a population of over 8000 employees and more than 350.000 processed rows. This was possible due to the several test runs that were done in advance. Controlled runs with actual production data allowed for the testing of protocols, processes and data quality.

 Viewed 4822 times by 1406 visitors


Category: HR / Technical
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.