The Major Theme for Big Data Integration in 2014?…Big Iron and EDW Offload
As we move into 2014, I am reflecting on where Hadoop stands today vs. this time last year. In the beginning of 2013, coming out of the New York Strata event in October of 2012, the market was hyper focused on SQL on Hadoop. No less than six separate announcements were made offering this approach headlined by Cloudera’s launch of Impala. This was a nod more than anything to both the skills gap that was preventing adoption and an interaction style that would insure organizations could rapidly gain value with Hadoop. In short, there was an underlying theme in this focus on how to make Hadoop accessible in the enterprise for real world work, and SQL access was clearly going to be a foundational piece.
A year later, there are multiple options available for SQL on Hadoop and while several are still early in their life cycles, they offer significant capability and are being used to great effect in large production deployments. It’s probably going too far to say SQL on Hadoop is completely solved, but it safe to say that enterprises will have the robust SQL access they desire. The rapid identification of requirements and significant investment to address it is in fact, one of the most powerful aspects of the Hadoop ecosystem. The rapidly maturing ability to run SQL on Hadoop is an example of the swift evolution that provides organizations conviction that this is the platform to invest in.
So as I consider the conversations I had at last year’s Strata conference in New York and subsequent discussions, the most common theme is offload. There are many names for this – rationalization, augmentation, etc. The basic concept is how I can move data and processing workload from a high cost environment (most commonly ELT in the data warehouse) to a low cost environment (Hadoop) and at the same time lay the foundation for an enterprise data hub. The conversation has moved from how would my team access data in Hadoop to how can I use Hadoop. Importantly, it’s not just how to use to platform but also how to start in a place that has an immediate ROI. There is also an underlying implication of this trend that shows Hadoop is growing up. Hadoop is not relegated to internet companies with no legacy systems, it is being adopted by enterprises who are exploring how they can lay the foundation for Big Data analytics and safe money at the same time.
From Syncsort’s vantage point, we have seen this trend from a unique angle, and that is from the mainframe vantage point, looking out across the enterprise. With the launch of our Hadoop-based data integration solutions ─ DMX-h for on premise and Ironcluster on Amazon’s EMR, we have been focused in part on how to help enterprises move data and processing from Big Iron to Big Data. The reception to this message has been dramatic. Again and again, I hear what businesses want to do and have for so long not been able to afford is combining long histories of operational data with semi structured and third party data. They have been stymied by the cost of the data warehouse, the lead times to implement new sources and views of data and an inability to explain exactly what they need in advance of actually analyzing the data. Hadoop overcomes these obstacles given its price point and its architecture. However, it’s not exactly clear how you get the mainframe data there. This trend is a subtheme of the offload trend where, according to feedback from our customers, as much as 70% of data in a typical enterprise data warehouse is sourced from the mainframe. We are working with customers on how to solve this challenge and will continue to expand our capabilities here in 2014. We believe by enabling easier access to mainframe data, we will help customers overcome one of the more challenging aspects of executing on their offload strategies – the flexibility to move data simply and efficiently between the mainframe and other enterprise data stores.
The industry has moved from asking “how do I access and interact with Hadoop” in 2013 to “how do I use it in the enterprise to create new insights and save money” in 2014. The ecosystem was successful in answering the question for 2013. There is little doubt, it is up to the task of addressing the requirements to enable offload in 2014. Certainly we will be doing our part here at Syncsort.