At Hadoop Summit, Syncsort’s Paige Roberts sat down with Hortonworks Chief Technology Officer, Scott Gnau. In this interview, Scott talks about what Hadoop is, from his perspective, how it fits into the wide world of data analytics, and what’s coming down the pipeline that’s particularly exciting.
In part 1, they talked about Hadoop, Spark and Hortonworks’ reseller partnership with Syncsort, In part 2, Paige and Scott talk about the state of governance in Hadoop and metadata management, what’s new in cyber security and why it’s important.
Paige Roberts: One of our customers’ big questions is about metadata management and governance. We’re seeing that Atlas is starting to gain ground. What would you say is the state of governance and metadata management in Hadoop?
Scott Gnau: I think it’s still very nascent space. It’s obviously something we at Hortonworks think is extremely important. But it’s also extremely important that we solve any of those big problems with a community approach. Sometimes getting the community to rally can be more time consuming than in a proprietary software company where you can say, “Alright, you, go build this solution.” You can direct it. You can make it start. You can make it finish. Working with a community requires a bit more finesse and I think, from a startup perspective, it can take a little time to get people’s attention. We’ve done that.
We got the community together. We got a number of customers to collaborate with. We got contributions not only from Hortonworks, but from across the community on Atlas. The thing is now starting to gain momentum. Also, I think the community approach is important because it needs to be an open API that everyone will adopt or it will fail. We saw that in the RDBMS world. No one really emerged as a leader in metadata management because every RDBMS out there had its own standard. There was this fragmentation in the market that didn’t allow it to be completely successful. Certainly there were tools and applications, but in the end, there wasn’t one uber solution to metadata management in RDBMS.
In our world, it is a much more complex problem because the data sets are much more diverse. They change more rapidly. There are more eyeballs and hands touching the data. So, it’s also a bigger problem to solve, and I think the community can do that by causing everyone to come together, and reducing that fragmentation that we saw in some of the earlier markets. That said, maybe [it isn’t happening] as quickly as we would like, but now we’re there, we’ve got momentum. We’re on the second major release of Atlas, and we’re starting to gain some ground.
Obviously, the approach you take in that governance space is the lowest common denominator. How do I create a common interface for data tagging and data tags that stay with the data all throughout its lifecycle, regardless of who touches it or what applications it’s loaded into, so that you can always trace it back from a governance perspective. Also, where we’ve integrated with Ranger, you can now apply security rules based on metadata tags. So I think it’s a very powerful framework. It’s going to continue to mature, and add feature functionality, integration, operationalization, all the things that you expect as something gets off the ground and becomes more mainstream. Obviously, we’re happy that a number of application vendors are looking at how to plug into Atlas and build a contextual kind of user interface around metadata management, taking advantage of the tags that are built in the platform.
Sorry, that wasn’t exactly a yes or no answer.
No problem. That would be boring. And a really short interview.
[laughing] Yes, of course, that’s always the answer, the answer to life, the universe and everything. So, tell me about something cool coming up that we don’t know about yet.
I think Apache Metron is interesting. We had a keynote speaker [at Hadoop Summit] talk about the advantages. There are two things I like about Apache Metron that I think are exciting. The first, when I think about cyber [security] and all the threats that are out there, it’s almost like there’s a community of cyber terrorists out there that are clamoring against the free world and free information flow. Frankly, they’re making the world more of a pain in the neck than it needs to be. So, I like the idea of an open-source community kind of banding together to create technology to protect fee information flow. I think there’s something very fair about it.
It’s very comic book awesome justice.
Yeah. You’d get that. You’re wearing a Captain America shield on your shirt.
Yeah, it’s a Syncsort shirt, really, but I love it because I’m a massive comic book geek.
So you’ve got that. And then, the second thing that I think is interesting about it is that it really is an example of what we call a modern data application. It takes advantage, not only of historical data, but also of streaming data for a net benefit. So instead of analyzing logs after the fact and understanding that you were hacked, you can do that to create analytics. Then you can apply those analytics in real-time. You can say, “Hey, this bozo is actually trying to hack me now. Let me shut that port off.”
We talk about modern-day applications being able to make decisions at the right time, or in real-time using streaming data, taking advantage of the collective knowledge you have from data at rest.
Cyber security is the ultimate example of that.
Cyber security is a really great example of that. It’s completely horizontal. It impacts every business, every company that’s out there. It’s a really great use case for us to exemplify what modern-day architectures are really about.
Is there anything else you’d like to add?
Hortonworks is way cool.
I totally agree. Hortonworks is way cool.
See the results of Syncsort’s third annual Hadoop Survey in the free eBook: Hadoop Perspectives for 2017.