extreme data

I was on the underground in London last week on my way back from visiting a financial services customer when I heard a couple of well dressed gents carrying brollies (is it only us Brits that leave the house assuming it will rain no matter how nice it is outside?) musing over the old adage that the only thing you can count on is taxes, death and trouble (as captured in this Marvin Gaye song).

Their conversation got me thinking that instead of trouble, there is actually another thing you can rely on today – that data is only going to get bigger.  I would argue that the amount of useful information to be gleaned from this data is not growing at the same exponential rate. However, regardless of whether you consider your data ‘Big Data’ or not, you actually have to do a lot more “work” to your data as it grows to get business relevant and valuable information from it.

A good example of this is close at heart to those of us impacted by the Eurozone (I’m intentionally avoiding the long debate as to if the UK is actually a member given we’ve kept our own currency but are paying to support the euro). The financial crisis worldwide caused the rapid acceleration of new regulations and controls on markets and companies. In Europe, we already had Solvency II and Basel I, Basel II and now Basel III. These regulations are getting incredibly complex.

Calculations on “extreme” data volumes are required to remain compliant and keep senior executives from going to jail. In this case, picking the right ETL tool can be like receiving a “get out of jail free” card in Monopoly.

So why are the calculations required so complex? For starters, here in Europe we love them as evidenced by European Commission regulation (EC) 2257/94 which states – bananas must be “free from malformation of abnormal curvature.”  In the case of “extra class” bananas, there is no wiggle room but “class 1” bananas can have “slight defects of shape” while “class 2” bananas can have full-on “defects of shape.” Yes, that’s right. We have regulations about the shape and curvature of bananas and don’t even get me started on cucumbers (Commission Regulation (EEC) No 1677/88), where “class I” and “extra class” cucumbers are allowed a bend of 10mm per 10cm of length. Class II cucumbers can bend twice as much. So you can imagine how detailed our calculations must be for something like risk!

About 2 years ago, I was heavily involved with a very smart team working on industry models. To keep up with them, I decided I had to read and understand the Basel II regulations. All I will say is that whenever someone mentions they are working on a Basel project, it brings back horrible memories. I remember it being 4 a.m. on the first day of my “reading project” when I realised my brain hurt and that the scroll bar on the document didn’t look like it had moved. Tying this back to data integration, the point is that it’s definitely not just the volume of data that causes the problems for customers. More often than not, it’s the complexity of calculations or transforms they are dealing with.

Often when I’m speaking with people about data integration acceleration (a good example was the bank I visited earlier this week), they will respond that “our data isn’t really that big.” When pressed on how long it takes them to process their data and whether this satisfies the business, people usually pause and you can see the wheels turning in their head. This is regularly followed by an admission that they are in fact exceeding their service level agreements. The next question is to ask them how much data growth they are seeing and are they prepared for it. After an even longer pause, something like “we plan for 20 percent growth” (a commonly accepted average). However, I’ve heard numerous companies admit that actual data growth could range from 10 percent all the way up to 600 percent! But no one ever says their data isn’t growing. Inevitably, the conversation ends up focusing on how much time they spend tuning their existing environment, how much hardware they are buying, and how they have no better option than to push transformations into the database.

It is always a bit amusing and always very satisfying when the same people who were saying they don’t have ‘Big Data’ are suddenly advocating for why data integration acceleration is needed and makes a lot of sense. Instead of reminding them of what they said in the first place, I simply smile and mention the amount of money they will be able to save from it, as well.

Perhaps I should revisit the title of my post. Three things that are guaranteed are death, taxes and data breaking your data integration infrastructure. If you are already using DMExpress, you can forget about the last one since we have you covered. Everyone else, you are invited to have one less thing to worry about. The whole death and taxes things…we are sorry but can’t help there!

{ 3 comments }

Big Data? Bring it On!

November 1, 2010

More than ever, data is permeating every aspect of our lives, from the way we do business to the individual choices that we make such as a picking a specific restaurant or a consumer product. More than ever, we all play a key role as producers and consumers of information. This brings an enormous challenge to organizations who must transform this data into much needed insights:  insights to grow, to compete, to survive.

Last week, hundreds of organizations from all around the world gathered at the annual Teradata Partners user conference in San Diego to learn and exchange ideas on how to leverage one of their most valuable assets: their data. Hundreds of sessions, analysts, and employees from all sectors and industries revolved around a few common denominators:

  • Extreme data volumes: every day organizations collect a tremendous amount of data, coming from multiple and often disparate sources
  • Accessibility of information: new paradigms such as mobility and collaboration are increasing the accessibility of information, making information available to thousands of previously under-served users
  • Effective data integration strategies: extreme data volumes coupled with new and revolutionary paradigms for accessing and delivering information will demand extreme data integration solutions

To date, most data integration strategies have focused solely on solving functional problems, neglecting the need for fast performance and extreme data volumes. Therefore, organizations are increasingly facing critical performance bottlenecks that can hinder their ability to capitalize on their IT investments, and thus to operate and compete. For instance, a major health care management organization must deliver hundreds of thousands of reports containing daily claims to insurance companies. Failure to deliver these reports to its customers can result in critical errors administering and delivering health benefits to individuals. Similarly, businesses in all industries and markets are increasingly facing performance bottlenecks in diverse areas.

Three weeks ago I joined a company that – along with great people and unique technology – has set itself to address the data integration “acceleration” challenge in a world of extreme data volumes and increasing demands for information:  How to accelerate the existing data integration processes to fit into the ever shrinking batch windows? How to help companies harness the power of their data?

Organizations that can build holistic data integration strategies – looking not only at functional but also at performance requirements – to provide timely, relevant information to the hand of common non-technical users, will succeed in capturing new markets and revenue opportunities.

Is your organization ready to take the challenge? The journey promises to be not only challenging but also exciting and rewarding. There’s no place I would rather be today.

The industry says “Big Data” I say, Bring it on! See you next time at Syncsort!

{ 0 comments }

At Syncsort, we understand that exploding data volumes and shrinking IT budgets are forcing organizations to get more out of their data – and even more from their data integration and data protection investments.  That is why we are focused on helping many of the world’s largest organizations to rethink the economics of their data to take advantage of unprecedented opportunities to unlock revenue and competitive advantage for their companies. 

There are many exciting developments in the data integration and data protection markets, and we look forward to using this platform to host conversations with customers, partners, industry experts and the larger data integration and data protection ecosystems. We’ll be sharing our perspectives on industry trends, the latest developments with our extreme data performance solutions, and much, much more.

I’m pleased to welcome you again to the official Syncsort blog! We look forward to hearing from you and encourage your thoughts and feedback.

{ 0 comments }

If you’re reading this blog, you’re probably involved in some way with server virtualization. It is without question the biggest IT trend and adoption numbers are pushing 100%. And with good reason: virtualization works. It saves money, and plenty of it (virtualizing a few hundred servers can actually save millions over time). It gives you much greater IT agility (need a new server? click, click, done!).  And it makes data protection so much easier.

Hah, fooled you on that last one.  

Many users are finding that the fly in the ointment of virtualization is data protection. Most people simply take their current backup solution and drop it onto virtual machines. Then they watch it collapse under its own weight.

The fact is that traditional, file-based backups don’t work well in the virtual world.  They have always relied on dedicated hardware and  unused system resources (the typical physical server runs at about 15% of capacity most of the time). But in virtualization, everything is shared: CPU, memory, network bandwidth, disk I/O.  Every virtual machine competes for resources, and very little is left for backup processing. If your backup agent assumes it can grab all the compute cycles it needs, things can get ugly.  

Fortunately, the situation isn’t  hopeless – provided you re-think how you do your backups in the new, virtual world.  We’ve published some of our thoughts on this subject in Business Computing World. You can read them here:  “The Five Imperatives for Extreme Data Protection in Virtualized Environments.”

What do you think? We’d love to hear your thoughts, plus any interesting stories about life in the trenches with VM data protection.

{ 0 comments }