Hadoop has matured considerably over the past few years, and has come quite a ways in terms of security. In its first incarnation, Hadoop’s security has been somewhat lacking. Data security isn’t something you leave in the hands of any “framework” or “platform”. You have to tweak, hone, refine, and continually improve your security settings, particularly when dealing with lucrative big data. Here are some relatively simple ways to protect your Hadoop environment with more security ASAP.
1. Secure Your Hadoop Data at Rest
Due to the nature of Hadoop clusters, as soon as you unleash data in there, it skedaddles to an untraceable corner of your environment. You aren’t really sure which server it landed on. That’s why data encryption at rest is so essential. Since many Hadoop vendors don’t offer encryption for data at rest, it’s easy to overlook, but this single step can mean the difference in a secure Hadoop environment and a painful data breach. Plus, when it’s time to retire your servers, you don’t have to worry so much about what’s on there. You can just discard it as usual.
2. Know Your Data
The thing about big data is, well, it’s really big. Sometimes you don’t even know what all’s in there. But without a reliable inventory of your data, you will be forced to provide the highest level of security to all your data. Or, worse, you provide all your data with the lowest security, and live to regret it. This is actually harder and more wasteful than an initial data audit. Get to know what your data is made of and then you’ll be providing adequate security according to the sensitivity of the data involved.
3. Do Your Due Diligence with Threat Modeling
Threat modeling is a process for identifying the vulnerabilities of your data and then developing countermeasures to protect it, or mitigate the threats inherent in it. Threat modeling helps you understand how the data could be used, and this is not always as apparent as it seems. For example, if a hacker gets his grubby hands on your customers’ date of birth, that isn’t such a big deal. But if that information is supplemented with things like physical addresses, it becomes much more valuable on the black market, therefore calls for much more security in your Hadoop environment.
4. Protect Data as It is Re-Identified
Tokenization and format-preservation encryption are both excellent techniques for obscuring data as it is analyzed. Sometimes one is best, other times the other is preferable. You’ll likely find uses for both, at times. But data needs to be protected both at rest and as it is in use within your Hadoop environment.
5. Identify Which Values Within Your Data are Sensitive
A good example of this is credit card numbers. Some of the numbers in a credit card number identify the bank that issued the card. Other numbers have no meaning or purpose outside the transaction being performed. If you understand which of the numbers you actually need to store, you can mask and encrypt data in such a way as to make it possible for you to identify your data, but impossible for the bad guys to do so.
If, you’re looking for the right tools and solutions to build and operate a secure Hadoop environment, you should take a look at Sycnsort’s Big Data Integration solutions now.