In case you were feeling gloomy…
Posted on March 25, 2009 by Thomas
Filed Under Trends | Leave a Comment
For all of you out there who are feeling the chilly breeze of global economic downturn: here’s an interesting read from Down Under, to make you feel a little better.
Ab Initio - The Origins Of Their Secrecy
Posted on October 4, 2008 by Thomas
Filed Under Ab Initio, Editorial | 2 Comments
Ah… Ab Initio. The mystery guest with the cool leather jacket at the salsa party of ETL tools, who doesn’t say much but always ends up attracting all the girls… If only because all the other guests have nerdy glasses and keep trying to explain the virtues of enterprise-wide data synchronization.
They are secretive. They are suave. They are a mystery.
Call it a hobby. A company that acts like it is operating in a world of cold war spy agencies rather than the slightly dry and uneventful theatre of database extraction is almost bound to make one curious. On one forum I recently saw a consultant wail with indignation, saying they would not even respond to a potential sale before he signed a non-disclosure agreement. On another I read a researcher declare that he would be excluding them from his market overview for the very same reason. They won’t talk, they won’t show, and if you do talk about them there is always the possibilty that there will be men in matching raincoats coming to have a friendly, private, and potentially terminal chat with you. One cannot but stop and wonder: why? Why, for goodness’ sake, would a company doing business in a competitive market want to act this way? What’s the point of it?
As it turns, there really is a reason. Or at least an explanation. I was most pleased when I found this article, titled “The Rise and Fall of Thinking Machines” from 1995. It makes for good reading.
BI-Research : Factors influencing the FIT between BI End-user and BI-Solution
Posted on August 12, 2008 by hessel
Filed Under Editorial, Various | Leave a Comment
Do you know your BI End-user(s)? Are you wondering why your BI-applications are barely used? Do you think technology is enough? Do you want to want to get maximum benefit from your BI investment(s)? During the next six months a scientific research project is conducted at Capgemini by a graduate student of Utrecht University, the Netherlands to get an answer to these questions. The (main) goal of this research is to obtain insight into the factors influencing the fit between Business Intelligence Solution and End-user.
The ETL Generator - A Holy Grail (or is it?)
Posted on June 26, 2008 by Thomas
Filed Under Data Warehousing, Super Fancy Sexy, Trends | 1 Comment
For some reason I can’t quite fathom myself either, I spend considerable amounts of time in all manner of datawarehouse / ETL related product development groups. (A substantial bit of that time, I should add, being my own; major system integrators put great value in R&D - just as long as it doesn’t eat into the billable hours.) And whether they called themselves Focus Groups, or Special Interest Groups, or Knowledge Groups, what they basically are all looking for is the answer to one simple question: how can we do what we do already, only better? And of course when we say better mostly we mean faster and, above all, cheaper. If your project leaders are anywhere near as sharky as mine, you’ll find that sooner or later you have to come up with a good reason why this ETL stuff is taking so bloody long.
Not that it is unfair of them to ask. I mean, let’s face it: the Business Intelligence projects that we work on generally come with price tags that would enable you or me to spend the rest of our lives on pearly white beaches, were the money to be deposited on our bank accounts directly. And while we’re at it, let’s double face it: a substantial amount of that money - up to 75 % I’ve heard mention - is spent not on the reports that are the end result, but on building the data warehouse and the ETL that is used to fill it with juicy numbers. And to top it all off, let’s triple face it: a whole lot of those data mappings we were building in that time really weren’t all that complicated. So the question, at least, is a valid one.
Profiling your data
Posted on June 22, 2008 by tom
Filed Under Analysis | Leave a Comment
Profiling the data can be described as getting an overview of the data. This overview gives you an indication on the maximum, the average, lowest figure, the highest, number of distincts etc. From such an overview, one is able to know if the data comply to certain business rules. One also gets a hunch of the data: if we have amounts, is it expressed in cents or euros. Or if weights are involved, do we have a weight that is expressed in gramms, kilos or tons.
A mathematician considers profiling the data as an easy job: only key statistics (frequencies, averages) are derived and no high level mathematics is involved. I remember the face of a mathematician, who after having explained what data profiling was, asked the question: what the hell are talking about? Data profiling is of more interest to the analyst, who wants to know if the data comply to buisiness rules and who would like to get more acquintance with the data.
Data profiling is mostly done during the analysis of an ETL project when data are to derived from sourcing systems and to be put in a data warehouse. Before building starts, we need to know the likely content of the data. Based on this knowledge, one could establish a dataflow that takes into account the particuliarities of the data. Example: if we know from the analysis that dataes of birth are subject to human error, we might include a check to see if the date of birth is within an acceptable range.
Previously such analysis was done thru a series of queries, in order to calculate the key statistics for every attribute in the database. Nowadays alternatives haven arisen that make life less complicated. I will mention 3 such alternatives:
In the Oracle Warehouse Builder, a wizard exists that creates a nice overview for an external table that is imported in OWB.
The SQL Server 2008 also includes a wizard to do the data profiling.
Lastly freeware tool exists that can be downloaded from “arrah technologies”. It has a fully functional version that can be used for a limited period. After that, it should be bought. To install, one should have the exact version of java that is required (1.5.0_09). Also, the way to get it started is somewhat tricky: it will start with “java Profiler”, which is not documented. In their tutorial, they showed a different command line, which did not get it running. Finally the data base connection should be set manually, which can also be tedious. But after all that, we have an intuive profiling tool that does what it should do: give insight into the data by means of key statistics.
In this article, I used much information from Henk Jan te Brake. Thanks to him; errors in this short overview are mine!
Keep looking »






