Hadoop: Uncovering the Elephant
Over the last week or so, I have read several articles about the challenges organizations face when deploying Hadoop. This is something that certainly caught my eye since the majority of articles and blog posts I’ve read about Hadoop have been more focused on the opportunities and promises rather than the challenges. As an aside, all the talk about elephants has brought back memories of Undercover Elephant, the good old 70s cartoon that Hanna-Barbera produced about an inspector elephant that helped to solve mysteries. But, I digress…
One of the articles from The Wall Street Journal, “Hadoop Has Promise but Also Problems,” was especially interesting since it featured commentary from executives at two major global organizations regarding Hadoop’s challenges around scalability and ease of use.
Let’s talk about scalability first. Hadoop gets its scalability by deploying a significant number of (arguably) cheap commodity servers. This way, the Hadoop framework can distribute the work among the different servers for increased performance at scale. Of course, adding commodity hardware running open source software looks like a much more cost-effective proposition than adding nodes to a high-end proprietary database appliance. However, the hardware required to cope with growing data volumes and performance SLAs can grow significantly. Therefore, it is not uncommon to find Hadoop deployments with a massive number of nodes. This obviously elevates not only capital costs but also operational costs due to hardware maintenance, cooling, power and data center costs. As one of the executives in the article mentions, this also complicates things “…partly because it requires that engineers deploy software across lots of different servers.”
This leads in perfectly to a discussion around ease of use. This aspect hits at the core of one of the major challenges that nearly every organization working with Hadoop is facing. Hadoop is not easy to develop. Among other things, coding MapReduce jobs and tuning sort operations requires very specific skills that are not only expensive but very difficult to find. For many organizations, this is the most significant barrier for Hadoop adoption.
Scalability and ease of use are both real challenges that organizations are facing today. That’s why Syncsort’s focus on Hadoop continues to center around removing the barriers for wider adoption by accelerating performance and reducing the complexity of Hadoop deployments. Our tests have shown up to 2x faster performance at scale (using same amount of hardware). This means organizations can potentially defer additional hardware purchases while coping with more data and increasing performance requirements. Additionally, developers can use DMExpress to easily leverage the Hadoop framework without the need to learn/code MapReduce jobs.
When executives make public statements in The Wall Street Journal that Hadoop “isn’t easy to work with… partly because it requires that engineers deploy software across lots of different servers,” it is important to listen. That is exactly what Syncsort is doing and why we are getting the word out that DMExpress can help organizations significantly reduce the number of servers required to meet performance goals. It is also why when these same executives say that Hadoop “comes with additional costs of hiring in-house expertise and consultants” that we nod our heads in agreement. DMExpress Hadoop Edition can help reduce those costs by providing an easy-to-use graphical interface that eliminates the need for additional MapReduce experts for constant coding and tuning.
We are listening and would love to hear from you. What has your experience been with Hadoop and are scalability and ease of use also challenges you are facing? We just might be able to help!