Data infrastructure optimization, availability & security software
Data integration & quality software
The Next Wave of technology & innovation

Expert Interview (Pt 1): Trends in Hadoop and Cloud Data with Tony Baer

Tony Baer, Principal Analyst at Ovum, recently spoke to us about trends in Big Data, including the future of Hadoop and cloud data.

Which directions are Big Data and data management headed in?

I think the obvious headline at the moment is artificial intelligence and machine learning. It’s the headline that is drowning out everything else. It’s drowned out things like real time streaming and IoT.

But I also think – and this is maybe happening in the background – is that there’s an ongoing trend towards implementation of data analytics in the cloud. The reason is this: Hadoop is hard to implement – you need rocket science to pull it off, even in organizations with lots of expertise and resources. So, what I’m seeing is cloud deployment becoming an overarching trend, because it eliminates the need for IT departments to set up Hadoop themselves.

Ovum‘s research shows that 27.5 percent of all Big Data workloads are now being implemented in the cloud, and I predict that by year-end 2018 or Q1 2019, half or over half of new workloads for Big Data will be deployed in the cloud.

By next year, then, the cloud will essentially be the default location for Big Data.

What technology or industry trend looks exciting to you?

The trend toward the cloud may not be exciting, but it is profound. It’s fundamentally changing not just where people analyze data, but the architectures they use for data management.

Hadoop and Cloud

Hadoop and cloud data: With over a quarter of Big Data workloads currently being implemented in the cloud, Baer predicts this to rise to 50% or more by year end. See our related blog 5 Reasons to Run Hadoop in the Cloud

As for excitement, machine learning and artificial intelligence are a better fit. But they are also scary. You might check out the book Weapons of Math Destruction by Cathy O’Neil, which points out that AI is only going to be as safe as the data sets that humans feed into it. So, yes, this is exciting, but there will be challenges to getting machine learning right.

Circa five years ago Hadoop was virtually synonymous with Big Data, and today people seem to be mentioning it less and less. Why do you think that is?

It’s a well-known phenomenon that a shiny new thing comes out, then reality happens and we start to view it in more even-handedly ways. That has happened with Hadoop, and it’s a natural process. In particular, people are starting to realize just how difficult the implementation of Hadoop is, and that’s been a reality check.

Read our eBook: 2018 Big Data Trends: Liberate, Integrate & Trust

That said, Hadoop has certain very important advantages. Even if the hype has quieted down, Hadoop is certainly not going away or becoming less influential. It’s a workhorse platform for many different workloads. And although Hadoop is not yet as secure as some of the major enterprise databases, there is certainly a lot of security and data governance that is now being built into it. It’s a well-rounded data management platform. This makes Hadoop different from, say, Spark, which has no built-in data management.

What are the coolest new things about Hadoop 3? Is there anything you wish were in Hadoop 3 but is not?

I think what’s interesting about Hadoop 3 is that it’s acknowledging something that I’ve always called for, which is that with any type of data platform, you have a lifecycle for data. Sure, big enterprises like Facebook and Google will always keep data live, so they don’t worry about lifecycles very much. But that’s not the case for mere mortal organizations: They have to deal with tiering and the reality of archiving. Also, directives like GDPR create requirements such as the right to be forgotten.

Previously, Hadoop did not have good answers for this. You could take an HDFS file system and delete it, but that was it. Now, Hadoop has erasure coding, which allows more compact, RAID-like storage. That’s probably the feature that really sticks out.

Be sure to check out tomorrow’s Part 2, in which Tony Baer discusses what’s next for Hadoop and cloud and whether we’re prepared for GDPR.

Download our eBook, 2018 Big Data Trends: Liberate, Integrate & Trust, for 5 Big Data trends to watch for in the coming year.

Related Posts