Data infrastructure optimization, availability & security software
Data integration & quality software
The Next Wave of technology & innovation

Three Guarantees – Taxes, Death and Bigger Data

I was on the underground in London last week on my way back from visiting a financial services customer when I heard a couple of well dressed gents carrying brollies (is it only us Brits that leave the house assuming it will rain no matter how nice it is outside?) musing over the old adage that the only thing you can count on is taxes, death and trouble (as captured in this Marvin Gaye song).

Their conversation got me thinking that instead of trouble, there is actually another thing you can rely on today – that data is only going to get bigger.  I would argue that the amount of useful information to be gleaned from this data is not growing at the same exponential rate. However, regardless of whether you consider your data ‘Big Data’ or not, you actually have to do a lot more “work” to your data as it grows to get business relevant and valuable information from it.

A good example of this is close at heart to those of us impacted by the Eurozone (I’m intentionally avoiding the long debate as to if the UK is actually a member given we’ve kept our own currency but are paying to support the euro). The financial crisis worldwide caused the rapid acceleration of new regulations and controls on markets and companies. In Europe, we already had Solvency II and Basel I, Basel II and now Basel III. These regulations are getting incredibly complex.

Calculations on “extreme” data volumes are required to remain compliant and keep senior executives from going to jail. In this case, picking the right ETL tool can be like receiving a “get out of jail free” card in Monopoly.

So why are the calculations required so complex? For starters, here in Europe we love them as evidenced by European Commission regulation (EC) 2257/94 which states – bananas must be “free from malformation of abnormal curvature.”  In the case of “extra class” bananas, there is no wiggle room but “class 1” bananas can have “slight defects of shape” while “class 2” bananas can have full-on “defects of shape.” Yes, that’s right. We have regulations about the shape and curvature of bananas and don’t even get me started on cucumbers (Commission Regulation (EEC) No 1677/88), where “class I” and “extra class” cucumbers are allowed a bend of 10mm per 10cm of length. Class II cucumbers can bend twice as much. So you can imagine how detailed our calculations must be for something like risk!

About 2 years ago, I was heavily involved with a very smart team working on industry models. To keep up with them, I decided I had to read and understand the Basel II regulations. All I will say is that whenever someone mentions they are working on a Basel project, it brings back horrible memories. I remember it being 4 a.m. on the first day of my “reading project” when I realised my brain hurt and that the scroll bar on the document didn’t look like it had moved. Tying this back to data integration, the point is that it’s definitely not just the volume of data that causes the problems for customers. More often than not, it’s the complexity of calculations or transforms they are dealing with.

Often when I’m speaking with people about data integration acceleration (a good example was the bank I visited earlier this week), they will respond that “our data isn’t really that big.” When pressed on how long it takes them to process their data and whether this satisfies the business, people usually pause and you can see the wheels turning in their head. This is regularly followed by an admission that they are in fact exceeding their service level agreements. The next question is to ask them how much data growth they are seeing and are they prepared for it. After an even longer pause, something like “we plan for 20 percent growth” (a commonly accepted average). However, I’ve heard numerous companies admit that actual data growth could range from 10 percent all the way up to 600 percent! But no one ever says their data isn’t growing. Inevitably, the conversation ends up focusing on how much time they spend tuning their existing environment, how much hardware they are buying, and how they have no better option than to push transformations into the database.

It is always a bit amusing and always very satisfying when the same people who were saying they don’t have ‘Big Data’ are suddenly advocating for why data integration acceleration is needed and makes a lot of sense. Instead of reminding them of what they said in the first place, I simply smile and mention the amount of money they will be able to save from it, as well.

Perhaps I should revisit the title of my post. Three things that are guaranteed are death, taxes and data breaking your data integration infrastructure. If you are already using DMExpress, you can forget about the last one since we have you covered. Everyone else, you are invited to have one less thing to worry about. The whole death and taxes things…we are sorry but can’t help there!

3 comments
  • Fred JACQUET — November 22, 2011 at 10:24 am

    I just would like to clarify the term ‘Big Data’, as I see it.
    Indeed, it does not only refer to volume considerations.
    First, the single concept of volume can vary; ‘Big’ also depends on the size of the company and the time available to process the data.
    On the other hand, unstructured data, or multi-structured data, Web Log, and other types of content stored in RDBMS as well as in the Files Systems, contain high added value data. Big Data also refers to such data.
    Also, regarding Big Data processing, it is necessary, beyond the capacity to move data (ETL) to store data (MPP RDBMS), to be able to ‘understand’ and process the data in the complexity of its structures, using alternative technologies and methodologies, like the ability to perform analytics within the RDBMS (in base processing).
    [See, combination of Teradata and Aster Data: http://www.teradata.com/product.aspx?id=17681%5D.
    [See also Gartner’s Big Data definition including ‘Volume’, ‘Variety’ and ‘Velocity’]

    • Steven Totman — November 22, 2011 at 5:03 pm
      In reply to: Fred JACQUET

      Hi Fred – nice to see a blast from the past – It was fun working at Ardent/Ascential with you.
      Regarding your Big Data definition comment I completely agree it’s far more than volume – in fact I put together a word map for a presentation at FIMA in London a few weeks ago showing the wide variety of terms that analysts / companies use to define “Big Data” (Gartner has a report out describing how Big Data is Beginning of Extreme Information Management). Your also completely right it’s the combination of Hardware, ETL tool and Database that is the critical combination to handle Big Data.
      The key is to do the right work in the right place – we do see a lot of customers that are doing ELT type transformations in the database (simply because most other ETL tools can’t handle it).
      I am in Paris frequently – be great to catch up next time I am in town.

  • Sam Berg — December 7, 2011 at 2:34 pm

    We are seeing the other end of the extreme with the velocity of change to the transactional data that ends up on someone’s big data platform. At VoltDB, we are seeing an emerging use case of ingesting a firehose of transactions, doing some DB work on them for real time analytics, but also some light ETL for preparation of this temporally meaningful data to be added in stream to Hadoop, Vertica, etc. In this case, data size is not too large, but velocity of the updates of tens of thousands to millions of records per second can generate more valuable analysis of the historical data set. Challenge is to introduce fast data to big data and make something useful out of it.

Leave a Comment

Related Posts