Even if you analyze your data in real time, storing data for extended periods is important for compliance and other reasons. But what types of data should be retained and how long should you keep it? Keep reading for some insights on data storage.
Data Analytics Requires Data Storage
These days, real-time data analytics should be the foundation of most organizations’ approach to working with data. (For more on just how important real time has become, check out Syncsort ‘s Hadoop Market Adoption Survey report, which explains best practices for data management and analysis. But that doesn’t mean that you should interpret data as it streams in, then delete it forever.
On the contrary, keeping data around for a while – even after you’ve interpreted it – is important. It helps keep you compliant by ensuring that data remains available for audits or other reviews. It also provides you an opportunity to review historical data to identify long-term trends, or investigate incidents that you may not discover until long after the data related to them has been generated and processed.
Types of Data to Retain
The first step in building an effective data storage policy is to answer the question: Which types of data should I store for an extended period, and which can I delete instantly?
The short answer is that, to the extent possible, you should retain as much data as your storage capacity can support.
But since most organizations must prioritize some data types for long-term data storage, here’s a general hierarchy that outlines which types of data to keep on hand. The data at the top of the list is the most important to store for as long as possible, while the data at the bottom is least important:
- Data that is required to be retained by compliance or regulatory policies. If you’re required by law to store a certain type of data, you should definitely keep that data around.
- Data that relates to your customers and helps you engage with them by achieving “customer 360.” Understanding your customers is hard, and you don’t want to give up the data that helps you with that challenge.
- Business documents, contracts and so on. This is important to store for as long as possible.
- Data that is generated by everyday business operations but is not regulated. This data can be helpful to have on hand for historical reviews or planning purposes, but it’s not essential.
- Machine data generated by your networking equipment, servers, sensors or other types of automated sources. Machine data tends to be the least useful type of data to store long term. It is sometimes useful to be able to review machine data when researching a technical incident or planning infrastructure expansions, but for the most part, machine data is only useful in real time, because the state of your infrastructure changes so quickly.
The exact types of data to prioritize for long-term storage will vary from organization to organization, of course. This hierarchy is just a general guide.
But there are practical limitations on how long you can store data. In tomorrow’s blog, we’ll discuss, just how long you need to keep it.
Discover the new rules for today’s data landscape – Download this eBook today!