Happy Anniversary Hadoop
Small Start, Big Data Result
Release 0.1.0 of Hadoop is now available. It was a simple message a certain Doug Cutting posted on April 2, 2006. The result, years later, is anything but simple. Dwarfed only by mass market products like MS-DOS, Windows, Linux and the web browser, Hadoop’s announcement has come to be seen as one of the most transformative milestones in enterprise computing.
For many, “Big Data” means Hadoop.
In a 2013 observance of the seventh anniversary of its first release, Hadoop founder Doug Cutting (who is also Cloudera’s chief architect) was downright sentimental. “People love a story,” he said, referring to the toy elephant name of the product, and went on to defend its whimsy. Whether because of that, or due to the surging power of the Apache Foundation open source ecosystem, Cutting believed Hadoop had become the most mainstream version of distributed computing that the industry has seen so far.
Impact Far, Wide and Big
One notable consequence is a healthy commercial space for Hadoop featuring important players like Cloudera (570 employees), Hortonworks (600 employees) and MapR, each of whom offer their own distribution. Intel looked these kids over and decided to invest $740M in one of them (Cloudera).
But these official clusters of Hadoop brainpower are small in comparison to the number of enterprises where Hadoop has made a place for itself by cozying up to other power tools in the enterprise toolshed.
A good example of this role was highlighted by SciQuest, which describes itself as a cloud-based “spend management solution” provider. SciQuest uses Hadoop to identify permutations of a supplier data in large organizations.
“We took a big, early bet on cloud computing, which is proving to be a preferred option in the technology/software market, and our adoption of Hadoop is showing similar promise – not just for SciQuest, but for our customers as well.”
Growing Up to Scale Up
In version 2, Hadoop moved from its batch beginnings to embrace batch, interactive, online and streaming modes. It added support for larger clusters, which added to its parallelized horsepower. With the advent of Yet Another Resource Negotiator (YARN), Hadoop users could readily separate processing resource management from scheduling. Unlike version 1, which used a single JobTracker, Hadoop in version 2 became a formidable scalable beast. According to a Hortonworks study around 2012, engineers were persuaded they could “simulate 10,000 node clusters composed of modern hardware without significant issue.”
You Say You Want a Revolution
Marking a previous anniversary, Doug Cutting said, “We are still in the early days of this revolution.” Syncsort, with its big data integration solutions for Hadoop, is a card-carrying member of the Party. But give the last word to Teradata’s Oliver Ratzesberger, who wrote that “some may argue that there are only three certainties in life: death, taxes and Hadoop.”