At the recent Data Day in Texas, Paige Roberts of Syncsort caught up with Joey Echeverria, an architect at Splunk, and author of the O’Reilly book, Hadoop Security. In part one of this blog series, Roberts and Echeverria discussed the common Hadoop security methods.
As we return for part two, Echeverria goes more into detail regarding different methods of security when dealing with Hadoop.
Roberts: Have you seen some of the interaction between Atlas and Ranger to work on the authorization, where you use Atlas tags and then Ranger goes in and gives authorization to the data that’s tagged a specific way?
Echeverria: Yeah. Most role-based access control systems need some metadata associated with the data objects that you’re going to access so that you can use that metadata to give permissions for the roles to access. Using things like Atlas to tag the data or the data objects, and then giving roles permission to read certain tags is a very fine grain way of giving access to data. It gives you a little bit more flexibility than having to always organize the data in a particular namespace, or having to grant permissions one table at a time. There’s a lot of advantages to using tag-based approaches for configuring your security controls.
So, if I’m setting up a Hadoop cluster and I’m really worried about security. What’s the thing that I should be worried about?
It’s hard to distill it to just one or a small number. The issue, by default, is these systems come out of the box unsecured. It’s not like they come pre-secured and then you just remove the controls that you don’t need. They come unsecured and you have to enable everything. You’ll want to probably build a checklist first and have a plan for how you’re going to roll out your security. You want to think about how you want to manage users, how you’re going to manage roles, how you’re going to manage the data itself. You also want to avoid having privileged accounts when possible. The other thing that I mentioned earlier is you want comprehensive auditing. Hadoop has the ability to do the equivalent of a Linux sudo, where you are one user but then you execute a command on behalf of another user, which is fine, but when you do that you want to make sure that you have a full audit trail there.
Make sure to tune in for the final part where Echeverria goes into the latest with Splunk, as well as the differences between Apache Spark and Flink.
Download our eBook, 2018 Big Data Trends: Liberate, Integrate & Trust, for 5 Big Data trends to watch for in the coming year.