The Breakpoints of Big Data
Since this is my first post on the Syncsort blog, please allow me to briefly introduce myself as Steven Totman, Data Integration Business Unit Executive for Syncsort based out of the U.K.
I’m very lucky this week to be joined by Hal Lavender from Cognizant and Syncsort’s own Jorge Lopez (who you’ll no doubt recognize as one of our frequent bloggers) for the FIMA conference in London and a panel discussion we are hosting on “The Breakpoints of Big Data.” Syncsort also has a booth at FIMA, so if you are attending please make sure you stop over to see us and learn more about DMExpress.
Over dinner last night (there were no takers for a traditional British meal so we ended up at a curry house), the three of us along with Nejde Manuelian, Syncsort’s director of data integration sales for EMEA, got into an interesting discussion around when data actually became “big data.” One of our intro slides for our FIMA session charts back to the 1970s when data was stored on punch cards of 880 bytes each. Having a “big data” problem at that time meant you needed a bigger cupboard for your cards and you had paper cuts from handling too many of them!
In the 1980s, when 3.5 inch disks storing a massive 1.44 MB each were the norm, a “big data” problem meant your stack of disks with Monkey Island and Wing Commander spread across 20 of them fell over. In the commercial world, IBM came out with the 3380 storing an amazing 2.5 GB. As Jorge pointed out, Google processes approximately 23 petabytes of new data a day! When did “big data” actually start breaking our IT infrastructures?
Well, the reality is that “big data” has been breaking stuff for a while. At Syncsort, we regularly see customers who find a mere terabyte of data. For context, Hal has a few terabytes on his home desktop back in Texas which he accesses from his iPad and is the breakpoint for many systems. It reminded me of a recent discussion with a CIO for a telecommunications company who explained that thanks to issues with doing ELT (where you push transformations into the database), he was going to have to ask the CFO for 40 percent more nodes (at $500k a pop) on his data warehouse database to handle the annual 10 percent growth. When the CF0 asks him what he’s going to get for $2 million, the CIO is going to have to tell him that he will continue to get the same report he got yesterday with no improvements. Not surprisingly, this CIO was not exactly excited about presenting this “business case” to his CFO.
“Big data” that breaks IT infastructure (especially ETL tools!) has been a dirty little secret for years and is just now generating mainstream awareness. The amount of customers using DMExpress to “accelerate” their existing ETL tools is testimony to that and in my view “big data” is as valid a description for a 5 person team with 10 terabytes of data as it is for a 500 person team with a petabyte.
If your company is combining data from multiple sources and it takes the IT team more than three months to add a new data source or create a new report (which was well above the average in a recent BeyeNETWORK survey) then chances are you have “big data.” The good news is that since back in the 1970s when punch cards had just been phased out, Syncsort has been enabling customers to seamlessly drop our software into their existing environments to accelerate and solve “big data” problems.
If “big data” is a new name for a long existing problem, then Syncsort with our ETL 2.0 approach can be a key part of the solution. Please come visit us at FIMA. We’ve been solving “big data” breakpoints for years.