Testing ETL or: Some More Fun with the Informatica Repository
Posted on April 8, 2008 by Thomas
Filed Under Data Warehousing, Informatica, Tips 'n Tricks |

Family and friends sometimes ask me (although usually not more than once) what exactly it is that I do. Sure, you’re a Business Intelligence slash data warehouse slash ETL specialist, but honestly; do you honestly expect us to know what that means? Management information? Huh? I usually answer the question by explaining that, on a very basic level, I create database tables and the fancy programs that fill them from other tables. The very basis of any BI environment in a nutshell.
Those that didn’t stop listening somewhere halfway - and understood what I was trying to tell them - are generally rather disappointed. Filling tables with data? Is that all? What’s so complicated about that?
The rest of the conversation I will spare you, but truth be told: it took me a while to come up with a satisfying answer to this often-asked question. What is it that an ETL developer does all day? Really, what is it about our job that is so damn complicated? I thought about writing an article about the necessary steps for building a proper ETL flow, or perhaps about the challenges that we face… HOWEVER… Saying it like that makes them seem like an awfully boring subjects for an article.
So: now for something completely different. Let’s put a spotlight on one of the ETL development phases - testing - and talk about how Informatica Repository SQL can make that task a little easier.
People who read my articles will know that I’m a fan of keeping my work as challenging as I can. I do this, on the one hand, by spending as much time as I can on things that are complicated (and often not entirely necessary) and, on the other hand, by spending as little time as possible on stuff that I consider standard or repititive. And some of my most shameful behaviour I bring to bear during the testing phase of ETL development.
Testing is, of course, necessary. No one’s perfect. But checking other people’s work is still terribly boring in my book. So if am unlucky enough to find myself in that position, I generally want to be done with it as quickly as I can. Anything that can be automated, I say, should be automated.
The responsibilities of an ETL tester as I see them are three-fold. From basic to advanced:
-
Does the mapping adhere to the development standards and naming conventions?
-
Does mapping do what the technical design says it should do?
-
Does the mapping work correctly in relation to other processes in your data logistical flow?
When I test my mappings, I start out with the first and from there work my way down the list. As said; simple equals boring, and rules and conventions curbing the creative process are possibly even more annoying, so I generally spare the first point only a cursorary glance. That was, until one woeful day our data warehouse project’s administrator would not approve our ETL and pointed the big Finger of Responsibility in my direction. “From now on, ye shall be doing the checking of the transformation names!”
Bugger. But, no harm done. It was quickly decided to leave the first step to my most trusted, mindless computer minion: the Informatica Server. A set of mappings containing nothing more than a bunch of queries on the Informatica Repository would give me a quick list of which mappings or workflows did not have a description added, which transformations did not adhere to our naming conventions, which sessions still had ‘bulk load’ enabled or did not have the correct commit interval, which tasks did not fail their parent if they failed, etcetera, etcetera. The usual stuff. It would do so without complaining, repetitively, and without error.
Sure it took me a few days to get all the SQL on paper. Sure it was a little complex at first. But I can tell you it was a hell of a lot more fun than all the empty-headed days of clicking that I would have had to do instead. It saved me my sanity during that particular project and, even better: it is sure to continue doing so in many more to come…
Here’s a few tasty examples:
Shows all workflows that do not create a backwards compatible logfile (Informatica 8.5 and up)
SELECT OPB_SUBJECT.SUBJ_NAME AS FOLDER
, OPB_TASK.TASK_NAME AS WORKFLOW
, ‘Workflow does not create a backwards compatible logfile‘ AS ERROR
FROM OPB_TASK
, OPB_TASK_ATTR
, OPB_SUBJECT
WHERE OPB_TASK.TASK_TYPE = 71
AND OPB_TASK_ATTR.ATTR_ID = 12
AND OPB_TASK_ATTR.ATTR_VALUE <> 1
AND OPB_TASK_ATTR.TASK_ID = OPB_TASK.TASK_ID
AND OPB_SUBJECT.SUBJ_ID = OPB_TASK.SUBJECT_ID
– AND UPPER(OPB_SUBJECT.SUBJ_NAME) LIKE UPPER(’FOLDERNAME’)
ORDER BY 1,2
Lists all sessions which do not have a parameter value in $Source or $Target
SELECT OPB_SUBJECT.SUBJ_NAME AS FOLDER
, OPB_TASK.TASK_NAME AS TASK
, DECODE(OPB_TASK_ATTR.ATTR_ID
, 7, ‘$Source‘, 8, ‘$Target‘) AS PARAMETER
, OPB_TASK_ATTR.ATTR_VALUE AS CURRENT_VALUE
, ‘Value not set using parameter‘ AS ERROR
FROM OPB_SUBJECT
, OPB_TASK
, OPB_TASK_ATTR
WHERE OPB_SUBJECT.SUBJ_ID = OPB_TASK.SUBJECT_ID
AND OPB_TASK.TASK_ID = OPB_TASK_ATTR.TASK_ID
AND OPB_TASK_ATTR.ATTR_ID IN (7,8)
AND OPB_TASK.TASK_TYPE IN (68)
AND UPPER(OPB_TASK_ATTR.ATTR_VALUE) NOT LIKE ‘%$%‘
– AND UPPER(OPB_SUBJECT.SUBJ_NAME) LIKE ‘FOLDERNAME’
ORDER BY 1,2,3,4
Shows all uncommented transformations in mappings
SELECT distinct OPB_SUBJECT.SUBJ_NAME AS FOLDER
, OPB_MAPPING.MAPPING_NAME AS MAPPING
, OPB_WIDGET_INST.INSTANCE_NAME AS TRANSFORMATION
, ‘Missing comment in transformation‘ AS ERROR
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 3,
’SOURCE QUALIFIER transformation (user defined) should be commented‘
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 4,
’UPDATE transformation should be commented‘
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 5,
’EXPRESSION transformation should be commented‘
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 6,
’STORED PROCEDURE transformation should be commented‘
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 7,
’SEQUENCE transformation should be commented‘
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 8,
’EXTERNAL PROCEDURE transformation should be commented‘
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 9,
’AGGREGATOR transformation should be commented‘
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 10,
’FILTER transformation should be commented‘
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 11,
’LOOKUP transformation (with user defined SQL) should be commented‘
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 12,
’JOINER transformation should be commented‘
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 14,
’NORMALIZER transformation should be commented‘
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 15,
’ROUTER transformation should be commented‘
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 26,
’RANKER transformation should be commented‘
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 50,
’ADVANCED EXTERNAL PROCEDURE transformation should be commented‘
, DECODE(OPB_WIDGET_INST.WIDGET_TYPE, 80,
’SORTER transformation should be commented‘ ,
’Transformation should be commented‘ ))))))))))))))) AS DESCRIPTION
FROM OPB_SUBJECT
, OPB_MAPPING
, OPB_WIDGET
, OPB_WIDGET_INST
, (SELECT DISTINCT OPB_WIDGET.WIDGET_ID
FROM OPB_WIDGET_ATTR, OPB_WIDGET, OPB_SUBJECT
WHERE OPB_WIDGET.WIDGET_ID = OPB_WIDGET_ATTR.WIDGET_ID
AND OPB_SUBJECT.SUBJ_ID = OPB_WIDGET.SUBJECT_ID
AND OPB_WIDGET.IS_VISIBLE = 1
AND OPB_WIDGET.WIDGET_TYPE IN (3,11)
AND OPB_WIDGET_ATTR.WIDGET_TYPE IN (3,11)
AND ATTR_ID = 1
AND ATTR_VALUE IS NOT NULL) T1
WHERE OPB_SUBJECT.SUBJ_ID = OPB_MAPPING.SUBJECT_ID
AND OPB_MAPPING.MAPPING_ID = OPB_WIDGET_INST.MAPPING_ID
AND OPB_WIDGET_INST.WIDGET_ID = OPB_WIDGET.WIDGET_ID
AND OPB_WIDGET_INST.COMMENTS IS NULL
AND OPB_WIDGET.IS_VISIBLE = 1
AND OPB_WIDGET.IS_REUSABLE <> 1
– Reusables do not need extra comments
AND OPB_WIDGET_INST.WIDGET_TYPE NOT IN (1,2,46,47)
— Source Definitions, Targets, Mapplet Input/Output
AND OPB_WIDGET.WIDGET_ID = T1.WIDGET_ID (+)
– SQ/LKP only need comments when User Defined
AND CASE WHEN OPB_WIDGET.WIDGET_TYPE IN (3,11)
AND T1.WIDGET_ID IS NULL THEN 1 ELSE 0 END = 0
ORDER BY 1,2,3
Comments
8 Responses to “Testing ETL or: Some More Fun with the Informatica Repository”
Leave a Reply







Some of these queries look quite familiar to me
Yep, but these queries DO work!
I wonder if such a tool exists in DataStage or Oracle Warehouse Builder.
These are very usefull.
Can i have a query which gives Lookupname, mapping name ,lookup table name and connection information?
Basicaly i need all details related to lookup from metadata.
Thanks,
I guess i got the data. I used this query if someone interested…
Thanks
Select distinct b.WIDGET_ID, a.mapping_name, b.INSTANCE_NAME Lkp_name, Decode(c.attr_id,2,c.attr_value) Table_name,
(c.attr_id,6,c.attr_value) src_tgt
FROM rep_all_mappings a,rep_widget_inst b, OPB_WIDGET_ATTR c
where a.mapping_id=b.mapping_id
and b.WIDGET_ID=c.WIDGET_ID
and a.subject_area=’xyz’
and b.WIDGET_TYPE=11
and c.WIDGET_TYPE=11
and c.ATTR_ID in (2,6)
How about queries to check the status of workflows? Do these exist or do I need to make them?
Thanks!
Jeff
could anybody give me an idea where to get an article regarding the internal tables,attributes and their description???
Thomas, I have been searching for a query that will help me do impact analysis for a table change? Do you have a query that could help me with that? You would be a life saver!