Best Practices for Increasing Data Availability
Understanding the importance of data availability is easy enough. Actually achieving high data availability, however, is harder.
If you’re wondering how you can improve data availability, keep reading. This post discusses some data availability best practices.
Perhaps the most basic step you can take to improve data availability is to ensure that your data is redundant—or, in other words, that you have multiple sources of the data available. That way, a failure in one of the disks, servers or databases that hosts your data will not lead to a disruption in availability.
The challenge with redundancy is striking the right balance between redundancy and cost-efficiency. In a world where budgets are a thing, the number of copies of a database or data server that you can have running at the same time is limited. However, by studying data such as how often a given server or database fails and assessing how important different data workloads are, you can make an informed decision about how much redundancy to implement for each data source.
Data redundancy is great, but data redundancy combined with automated failover is even greater.
That is because automated failover (as the term implies) means that when a component of your infrastructure fails, a backup component automatically replaces it. By eliminating the need to wait for a human engineer to detect a failure and switch to a backup system, Automated failover minimizes or completely avoids disruption to data availability.
Many monitoring and management tools for virtual servers and databases make it possible to configure automated failover. And if yours doesn’t, some simple scripting should suffice for ensuring that backup systems come online automatically when a primary system fails.
Avoid single points of failure
Another simple step that you can take to improve data availability is to avoid single points of failure—meaning infrastructure components or applications that would cause your data to become unavailable if they stop working correctly.
The concept here is similar to the point about redundancy made above, but there are differences between redundancy and eliminating single points of failure. You can have a redundant storage infrastructure composed of multiple servers and disks but still be at risk of having them become unavailable if, for example, the network router on which they depend crashes. In that case, your router would be your single point of failure. A well-architected infrastructure would avoid that risk by ensuring that not all data passes through a single router.
Embrace software-defined infrastructure (where possible)
Generally speaking, software-defined infrastructure and storage help to maximize data availability. That is because when the infrastructure and file systems that store your data are defined in software and not directly integrated with the hardware that hosts them, they are easy to move around and to scale.
Keep in mind that software-defined environments can be as simple as a virtual server and virtual disks, which give you the benefits of storage that is abstracted from the underlying hardware. There are fancier ways to do software-defined infrastructure, too, such as by using a file system like GlusterFS or Ceph. But you need not get that complicated to take advantage of software-defined infrastructure and storage in order to improve data availability.
Obviously, you can’t migrate every type of data workload to a software-defined environment. And some software-defined environments are more sophisticated.
Establish and enforce RTO
RTO, or Recovery Time Objective, refers to the amount of time that your business can continue to operate in the event of a disruption to data availability.
Depending on your industry, the amount of data you collect and other factors, you might be able to continue functioning for days without your data, or you may not last more than an hour before suffering critical damage to the business. If your business is a chain of coffee shops, recovering data instantaneously may not be as crucial as it would be if you are a bank and depend on digital data for nearly all of your operations.
If you haven’t yet figured out what your business’s RTO is, now is the time to do it. You don’t want to wait for a data availability disruption to occur before discovering how long your company can continue to function (or not) without its data.
Remember, too, that calculating RTO is only half the battle. In order to make RTO useful, you also need to ensure that you can recover from a disaster within the timeframe specified by your RTO. Here again, you do not want to wait until a disaster happens to test whether your disaster recovery plan is actually capable of meeting RTO requirements. Instead, run periodic drills to see how long it takes to recover data from backup databases and to switch to new data server instances to ensure that you are prepared to meet RTO requirements when the time comes.
Download our White Paper on Assessing the Financial Impact of Downtime!