You can store all your data for some amount of time, and you can store some of your data for an infinite amount of time, but you can’t store all your data forever. Not even a company the size of Google has the storage capacity for all that data retention.
This means you must decide which types of data to store for an extended period, as well as just how long that period should be.
Data Retention: How Long Should Data be Stored?
Once you’ve identified the types of data that you’ll store, you must decide how long to store it. There are two main ways to guide yourself when making this decision.
1. Regulatory Policies
If you have to keep certain types of data around to satisfy compliance requirements, the compliance policies usually define how long you have to retain the information. For example, data governed by HIPAA is subject to data retention requirements set by states (though not by HIPAA itself).
Keep in mind that the data retention requirements of compliance policies are only minimums. You can keep the data around longer if you wish – and it may help you to do so if the data can help you understand your customers or gain other business insights. But you do want to make sure you are meeting required minimum retention periods, at least.
2. Data Storage Costs
When deciding how long to retain data one must consider the cost of storing it for a prolonged period. When making this decision, it can help to do some simple math. You should first determine your storage costs. This is straightforward if you store data in a public cloud like AWS or Azure. It is more complicated if you store data using on-premise infrastructure, but there, too, you should be able to determine how much it costs you to maintain the servers that store the data.
You also have to determine how much the data is worth. This is also a bit of a gray area, but it should not be impossible to put a value on certain types of data. For example, you should be able to figure, by reviewing your marketing and sales data, how much it costs you on average to obtain data about a customer or potential customer. You can then use this information to decide when the cost of storing the data exceeds the cost of generating it, and plan your data retention policy accordingly.
Streamlining Data Storage Costs
It’s important to keep in mind, by the way, that data storage costs can vary widely, and are likely to be unnecessarily high if you store data in environments that are outdated or difficult to integrate with the rest of your infrastructure. Slow data offloading or difficult data transformation adds significant cost to data storage.
This is where resources like Syncsort can help. To learn more, read the Bringing Big Data to Life eBook, to find out how companies are integrating mainframe data into modern analytics environments like Hadoop in order to lower their data TCO – and thereby extend the amount of time they can retain that data for.