Data infrastructure optimization, availability & security software
Data integration & quality software
The Next Wave of technology & innovation

So You Want to Be a Data Integration Expert…

I recently came across a blog post from Susan Hall over at IT Business Edge on the “Seven Keys to Becoming a Data Integration Expert.” Naturally, the headline caught my attention and I soon learned that it was based on a recent post from David Linthicum on “Obtaining Mad Data Integration Skills.”

As I read through both of these posts, I started thinking. Instead of the order that the seven keys had been originally listed, what if I tried to rank them by how much time and money these things cost organizations during an average month. Here is what I came up with:

  1. Performance
  2. Data governance (this could arguably be number 1, but most organizations I have seen aren’t really doing wholesale data governance)
  3. Security
  4. Rules and routine
  5. Database concepts
  6. Interfaces to data
  7. Data mediation and transformation

David’s performance criteria states that “…the ability to define how a data integration solution will perform over time.  This is very important.”  I couldn’t agree more!  Building performance and scalability into a DI approach is not only important today, but also for the Big Data requirements of the future. David goes on to say that many DI approaches “become useless after several years.”

We see this every day with our customers and partners.  When they’ve hit the wall with their current approach, they often try one or more of the following:

  • Add hardware (CPU, memory) – this is expensive and adds to the software cost, and usually does not scale linearly
  • Fine tune the approach/tool – this requires very senior IT staff and/or highly-skilled (read: expensive) consultants from the vendor
  • Rip out the logic and push it into the database – now you have an ELT approach pushing the cost and complexity into hundreds of lines of SQL and PL/SQL

Syncsort helps customers solve their performance and scalability issues without needing to resort to stop-gap measures that accelerate costs.

Thanks to Susan and David for their posts and the inspiration they provided me to write this one.  I look forward to following the discussion on their blogs and reading what they write about next. In the meantime, feel free to leave a comment or challenge me if you want to debate the way I’ve ranked the list above.

  • Partha — March 27, 2012 at 12:49 pm

    Awesome post Keith. I feel like reordering your 3 approaches that customers take while they hit the wall.

    1. Fine tune approach / tool – Initially they will try to do it with their existing IT staffs. And they will fail miserably and in the process will inject a few bugs in the code. Then they will hire so-called “experts” who will eat up all their money and finally suggest hardware upgrade.

    2. Add hardware / CPU / Memory – In most cases, it will only shift the real problem from one type to other, say earlier it was memory bound, now it will become CPU bound. Even worse, after adding more memory and cpu they would discover there are serious I/O contention in the IO Subsystems.

    3. Rip out the logic to database – Yep, that happens when Vendor’s consultants come and propose you to buy a costly Pushdown optimization license – which like the above point, would only change the location of bottleneck from one place to other.

    If you got your approach (read “design”) wrong in the first place, there is little hope of getting things back to normal (although a loosely-coupled approaches to data integration may be the answer to some of these issues, check here…

    • Keith Kohl — March 27, 2012 at 1:43 pm
      In reply to: Partha


      Thank you for the (prompt) reply! I absolutely agree with you on the order and your reasonings. For fine tuning, it’s definitely an expensive option that will eat budget, especially from more strategic initiatives.

      Absolutely agree on add more hardware item as well. We have seen organizations invest literally millions of dollars in 20, 30, 40% more capacity, only to realize a fraction of that. That can’t and won’t scale.

      I don’t run into the rip and replace the database very often. I do see organizations say they’re not going to make more investments and begin to make strategic investments in other more efficient technologies, such as columnar databases, Hadoop, etc.

      Thanks again for the reply. Keep them coming!


Leave a Comment

Related Posts