Category IconCategory Icon
Informatica != SSIS (part 1)

Posted on May 22, 2008 by Bram@BI-Nerd.com
Filed Under Informatica, Microsoft, Tips 'n Tricks |

Hopping from assignment to assignment (or continent to continent for some of you) you’re bound to work with various pieces of software. From Cognos to Business Objects, from Oracle to MS SQL server, or from Informatica to Microsofts SSIS. It is ’stating the obvious’ to say that all those programs -despite having similar goals and functionality- work quite different. This article will delve deeper into some of the peculiar differences one might run into when going from Informatica to SSIS (based on my experience).

Lookups 1:
The first thing you’ll notice after having created a couple of lookups is that the ONLY way to create a lookup through the standard UI is by doing equal-comparison lookups, i.e. WHERE LOOKUP1=SOURCE1. You cannot use something like LOOKUP1>SOURCE1: it’s equal-to or nothing.
Fortunatly there is a (dirty) solution. Select the tab “Advanced” in the lookup transformation and enable “Enable memory restriction”. Next enable “Modify the sql statement’, you’ll see a small query based on the selected table. It will probably look something like this:
[CODE]
select * from
 (select * from LOOKUPTABLE) as refTable
where [refTable].[LOOKUPCOLUMN] = ?

[/CODE]
Next: update the statement to do whatever you want. For example update it to “[refTable].[LOOKUPCOLUMN] > ?”. Click on the “Parameters”-button to assign source columns to the questionmarks (the order of the questionmarks in your code determines the ‘name’ of the parameter, with the first being Parameter0).
Important note: Despite the fact that you’re overriding the lookup SQL statement, it still needs the mappings on the “Columns” tab. Make sure you’ve drawn a line between all the source columns you’re using as parameters to a lookup column. Which one doesn’t really matter since you’re overriding it anyway (as long as your source column has the same datatype as your lookup column).

Lookups 2:
Learn to properly set your datatypes. Where Informatica was a bit more forgiving about the use of datatypes, SSIS is not. For example, you cannot link a 2 byte integer to a 4 byte integer column. A simple solution is to use a Data Conversion transformation.

Lookups 3:
Watch out using lookups on string values. Let’s suppose you want to link vendordata to a certain vendortable. The vendordata table (source) contains a column with the string “Vendor1″ (no spaces). Your lookup contains the same string, no spaces again! Easy you’d say. Possible wrong i’d say. If the source/target field are both defined as a 10 character sized string, it won’t work, despite that both fields contain the exact same characters without spaces. For some reason SSIS considers reading a source that is allowed to have 10 characters, as a field which it will have to append with white spaces to fill up all 10 characters. To add some more confusion, it only does this on lookups. A solution to this is even more stupid: add a derived column box to your data flow and rtrim and ltrim the original value. The new value (despite still being defined as a 10 character field) will NOT be appended with spaces.

Some small tips and tricks now!

Delay validation:
SSIS allows a broad range of customization to your packages. You can create your own sources, transformations and targets as well as adding scripting tasks or using stored procedures and using their output as input for other processes. One example of this was that we created a temporary table (via a stored procedure) to which data was sent. After being processed in a data flow based on a specific query (variables / script task) the temp-table was deleted. This means that whenever the package would be started again, SSIS would come up with errors since it couldn’t find the target temp-table. This is caused by SSIS’s pre-run validation of the package. Fortunately you can disable validation until needed (when data is written to the target in this example) by setting DelayValidation (control flow -> properties) to True.

Lineage:

You’ll grow to hate those lingeage ID’s. They’re internal IDs identifying datastream (columns). They are often not capable of adepting to new situations. Example: suppose you’ve made a lookup and named the value you’ve lookupped lkp_VALUE. You decided that the lookup was wrong and deleted it and replaced it with a new lookup which will output a value called lkp_VALUE. Despite having the exact same name, lots of transformations will return errors now. This is because the ID of the new lookup is different from the old one. Simply double-click on the transformation and select the option “<Map using column name>”. This should solve it if you’ve used the exact same name.

Missing data after union-all

A union all transformation requires that you manually add new columns after having replaced/updated anything column-related earlier on in the data flow. Forgetting to do this, might make it seem that a column is not available! So: after each change, always check your union all transformations.

That’s it for part 1. As I keep on learning more about SSIS (and it’s quirks) I will add more parts if people like (if not: i will probably still do so). I hope this allows the change from Informatica to SSIS a bit more smooth for some of you out there!

These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Digg
  • Fark
  • Technorati
  • del.icio.us
  • Google
  • StumbleUpon
Email This Post Email This Post

Comments

One Response to “Informatica != SSIS (part 1)”

  1. hessel on May 23rd, 2008 7:57 am

    Very nice article Bram!!

Leave a Reply