ETL? Or ET-hell?

ETL. Sigh.

We were talking with a large, multinational IT services firm about their data warehouse. They explained their process for getting data into the warehouse and getting reports out. It was something like this:

  • Define the report that you think you want
  • Define the structure of data to build that report
  • Find the systems that have the data to build that report
  • Extract the data (E)
  • Transform it from its native structure into the structure of the report (T)
  • Load into the data warehouse (L)
  • Run the report

Pretty straightforward, right? Well…

Here are the problems that they expressed to us:

  • The process is slow. They estimate around 100 man-hours of work to get this done.
  • And before you object, that number is representative of what we’ve heard from other organizations as well.
  • Over 90% of the time, the report that they thought they wanted, wasn’t exactly the right report
  • So they tweak and repeat. Maybe it’s a little faster this time, but let’s say conservatively it’s still over 60 more man-hours.

Data Warehouse ETL process

This is what we like to call “ET-Hell”

It shouldn’t be this hard. It shouldn’t take this long. It’s 2015. OK, we don’t have flying cars, but we do have the technology to make this better.

We can’t avoid the concept of ETL. It’s a necessary evil if we really want to bring information together from various sources. Business applications, the systems of record, have structures for a reason – to support the business process of each system. And typically, you don’t have multiple copies of the same system within an organization. So we’re all stuck with ETL.

How can we do ETL better, cheaper and faster? Focus on the T and the L.

We probably can’t get rid of the E, because these systems of record are there for a reason. They will continue to exist. But the T can be made simpler. At FlockData, our T process involves simply describing the data in a descriptor document. We reference the exported data, together with its descriptor, to perform the L.

What happens when we encounter that same report problem – the reported we really want isn’t the report we got, the report we thought we wanted. No problem – we just tweak and iterate. But using FlockData, that really just means modifying the descriptor and re-running the L. We can reduce the whole process, from start to finish, by an order of magnitude.

FlockData – your “get out of hell” free card. 😉

Cross-posted from LinkedIn Pulse “ETL? Or ET-hell?”