3 Ways Hadoop is Growing Up
Hadoop has served both as the poster child and the whipping post for big data. On the one hand, Hadoop was among the first viable solutions for leveraging big data, and quickly grew into the go-to framework for big data initiatives. On the other hand, Hadoop took flack for being hard to use, slow and clunky, occasionally unreliable, and lacking in security. But that was yesterday’s Hadoop. Unless you’ve taken a look at it in the past year, you haven’t seen what it has to offer. Hadoop’s growing up, and here is the proof.
1. Spark is Now Leading MapReduce
Where there’s a spark, there’s likely to be fire, and Spark is burning past MapReduce pretty quickly.
MapReduce took heat from the beginning for being difficult to work with. Spark has risen to overtake MapReduce as the most active open source project in Big Data, and as much as half of all the work done using Spark isn’t even done inside Hadoop. Businesses want to be able to process and analyze data without having to run to IT for help all the time. This makes easy-breezy Spark an attractive alternative. You can get lots more BI with lots fewer workers using Spark instead of MapReduce. Spark and Hadoop make great pals, so expect the duo to be strong players in the realm of big data for the foreseeable future.
2. Hadoop is a Major Player in the IoT
Big data is the backbone of the IoT (Internet of Things). Since Hadoop is practically synonymous with big data, that means Hadoop is also a major player in the IoT. While real-time data streaming was problematic for Hadoop before, the new and improved version handles it rather well. Hortonworks DataFlow (HDF) made enormous strides in empowering Hadoop for the IoT, now providing many of the same features supported by Flume, Kafka and Storm with a nice design interface. MapR is doing their part with MapR Streams which supports global event streaming at IOT scale. Hortonworks Data Flow, Azure IoT Hub, and other vendor offerings have further empowered Hadoop to become a viable streaming solution for the IoT.
3. Hadoop Security & Governance Has Come a Long Way, Baby
The security and governance concerns that plagued Hadoop for years should be all but conquered with changes implemented in 2015 and 2016.
Security and governance were two areas where Hadoop took a beating. Now that enterprises are seriously eyeing the evolving Hadoop platform , it’s time for enterprise-grade security. Hadoop’s data governance issues were largely addressed with the release of Apache Falcon last year, and Falcon is now a part of HDP version 2.2+. This tool gives developers and other data stewards the power to define rules for data consumption, access, and lifecycle management. Falcon features auditing, as well, so if anything does go amiss, the admin can hunt down the source of the problem. In 2016, Hortonworks plans an even bigger list of new security and governance features to roll out, so by year’s end, enterprise adoptions are expected to be impressive. In addition, Cloudera has enhanced the key component of its Security and Governance solution with a new release of Cloudera Navigator Encrypt. Navigator Encrypt complements HDFS encryption to provide encryption at rest for sensitive data in the Hadoop cluster in temp/spill files, metadata databases and ingest volumes. It works together with Navigator Key Trustee, so that encryption keys are managed following security best-practices required by a number of compliance regulations, including PCI and HIPAA