Strata Conference 2014: The Weather and Hadoop Heating Up
Strata Santa Clara was a great reason to escape the polar vortex for a few days this winter. As usual, a great set of conversations and lots of learnings. I shared some of these in an interview on theCUBE on Tuesday, which you can view here. I did want to expand a bit on one of the key milestones we are seeing Hadoop cross in 2014 and that is basic, but quite fundamental. There is budget for Hadoop in the enterprise in 2014. Last year was the year of the Hadoop POC. What can it do? Where does it fit? 2014 is the year of the Hadoop project. Not all are well defined but there is budget allocated and it is being spent. There are some interesting implications to this important milestone that I think we will see drive a new wave of innovation.
As customers are looking to roll out their first production workloads they are basing the roll out on a business case. That business case has a set of costs and a set of expected returns. I believe as enterprises move through this process they will turn a key eye to performance. Hadoop is low cost and it is scalable. It allows the collocation of vast amounts of data and processing power and that is opening up opportunities for cost savings and new business insights never before possible. However, Hadoop is not particularly performant. That’s ok, you can just add more nodes, right? Actually, I think this is where we are going to see a change in perspective over the course of the year.
At the hardware level, we are not seeing customers choose low end servers. Rather we are seeing a consistent reach for relatively large nodes in terms of storage (20 to 30 TBs), fast disks (10K) and plenty of memory. At the Hadoop layer we are seeing intense focus on projects like Tez, stinger and Spark that will boost performance in various ways. Adding more nodes is certainly a way of scaling but there are clear and significant opportunities to dramatically improve cluster performance by innovating at the storage, memory, networking and software layers. Many of these performance optimizations will also provide a more attractive ROI for customers than adding the equivalent capacity in the form of additional nodes. To date customers haven’t focused their energy looking at throughput per node or other various performance measures, but as customers allocate budget, deploy production workloads and measure results against a business case, they are going to be focused on getting the most out of the nodes they deploy. I think this will be a great forcing function for the industry to continue innovating. Here at Syncsort, we look forward to doing our part.