Data infrastructure optimization software
Data integration and quality software
Data availability and security software
Cloud solutions

Hadoop Co-Creator on the Future of Big Data

When one of Hadoop’s original developers tells other developers to pay attention to Google’s prognostications, it is advice IT executives should ignore at their peril.

“Google is living a few years in the future and is sending the rest of us messages,” Doug Cutting told the O’Reilly Strata Conference in London this November.

For example: Google Spanner, a distributed database technology which has attracted attention for its use of a concept Google calls TrueTime. TrueTime allows Google data centers across the world to remain in sync with each other, while avoiding excessive latencies.

Cutting’s Hadoop, the open source distributed database project he co-founded, owes a debt to Google’s earlier MapReduce concept. Synchronization is an important aspect in maintaining Hadoop responsiveness and scalability across widely distributed databases. Further study of Spanner’s implementation will likely be reflected in future releases of Hadoop — or entirely new offerings inspired by Spanner.

Watch Cutting’s presentation below, or scroll down for more predictions.


Doug Cutting on The Future of Data

Whither OLTP
Cutting does not anticipate the imminent death of relational database systems, or the mature ecosystems around them. But he does envision that some aspects of large enterprise information requirements will demand “Google-like” flexibility or agility. Enterprises will need to determine how to proceed simultaneously with OLTP systems while embracing Big Data.

Weaving YARN into Distributed Data Systems
Extending Hadoop to handle new and different types of processing loads is the focus of an emerging set of support tools. Applications such as machine learning or real time event processing such as for smart grid sensor networks will place new demands on Hadoop clusters. Processing will need to be staged, queued and scheduled.

New support tools like Apache YARN, HortonWorks commercial YARN and Syncort’s Ironcluster for Amazon EMR and Hadoop ETL represent an emerging ecosystem for Hadoop that anticipate real world enterprise requirements.

Graph Databases
Graph databases such as Apache Hama and Faunus leverage the Hadoop Distributed File System (HDFS) but may prove useful for different sets of applications. Graph databases may provide useful for enterprises engaged in large scale research, engineering and genomics.

Hadoop-Sweetened Business Intelligence Suites
Business intelligence suites such as Tableau, QlikView, SAP Hana and Business Objects can access to Hadoop stores through a variety of methods, including those provided by Syncsort or Cloudera. This relatively recent Hadoop flavoring allows architects to concoct heterogeneous recipes that consist of online transaction processing (OTLP) and HDFS.

Back to School for Updated Lessons
Distributed processing has been heavily studied in academia since the 1970s. Faster pipes, Big Data and mega-core computing clusters can be seen as long-anticipated evolutionary changes. Sorting, compression, bit maps, synchronization techniques and other approaches continue to be essential software building blocks. Seminal work such as C.A.R. Hoare’s Communicating Sequential Processes continue to influence distributed systems designs.

Managers will want to take stock of the current skills of staff engineers. Updated education will likely need to encompass more than Hadoop fundamentals.

The Other Oracle
Cutting was not suggesting that Larry Ellison’s company can be ignored as irrelevant — only that Google is emerging as the other oracle in Silicon Valley, and we would do well to listen to the company’s predictions.

Mark Underwood writes about knowledge engineering and Big Data.

4 comments
  • Online — January 17, 2014 at 6:11 am

    This very good article. this is very use ful for hadoop leraners.

  • eshwar — January 17, 2014 at 7:59 am

    This is very good article and helpful to know information about Hadoop..
    Thanks for posting such a nice article for Hadoop and Big data

  • Take Off With the Cloud in 2014 | CBRdigital.com — January 21, 2014 at 8:37 pm

    […] cloud offers an almost never-ending supply of data storage, which gives a whole new meaning to big data and information […]

  • […] networks are storehouses of big data for customers and processing it for various uses, including advertising. This only encourages hackers to target […]

Leave a Comment

Related Posts