5 (More) Metrics to Use for Measuring Data Quality
Data quality is not the type of thing that you typically track with a dashboard or assess in an annual report. But that does not mean that you cannot measure data quality—or that you should not be measuring data quality on a routine, ongoing basis. On the contrary, unless you measure data quality consistently, it’s impossible to guarantee that a given set of data will be actionable when you need it to be.
The first step in measuring data quality is to identify the metrics that you need to track. In this article, we take a look at some data quality metrics and explain how they can help you assess data quality.
We have already discussed some data quality metrics in a previous article. In this one, we go further by looking at additional data points that can help you to assess the quality of your data.
Data Management Costs
The costs associated with data management—which refers to the complete set of processes required to collect, transform, analyze and store data—tend to increase significantly when the data you are working with is of low quality. That is because data transformation takes longer, data analytics are more difficult, and storage costs can be higher.
Of course, other factors can affect data management costs, too. Rising data management costs do not necessarily mean that you have a data quality problem. But data quality is likely to be one of the main culprits in this case, so this is a useful metric to follow when you are worried about data quality.
In many cases, low-quality data is also large in size. When your databases are filled with redundant entries, or you store outdated data without reason for doing so, your data takes up more space.
Here again, data quality problems are not the only possible source of growth in data size. But if you find that the volume of data you have on hand is getting bigger and bigger, and you cannot attribute the trend to another cause, data quality problems might be at play.
Do you have multiple entries in your database for the same information? Or do you have multiple databases that contain the same data?
These and other examples of data duplicates undercut data quality. They make it harder to identify the original sources of data and increase the complexity of data management.
The more data duplicates you find within your data sets, the lower the quality of your data is likely to be.
We live in a world of relatively cheap data storage. We are also reminded all the time of how powerful data is.
For both of these reasons, it can be tempting to collect as much data as you possibly can, and to store it as long as you can reasonably afford.
In fact, however, if you are collecting and storing data that you do not need, you could be undercutting your data quality strategy. Unnecessary data creates distractions and makes it harder to find the information you really need. If you find yourself with data pools that you never actually use, they are likely harming your data quality.
The best way to fight unnecessary data collection and storage is to establish data governance policies that clearly identify which types of data different teams should collect and how long they need to retain it for.
Lack of compliance with data standards
Data standards can be set by multiple authorities. Some are created internally by your company to define how data should be stored or transformed. Other data standards, such as the data formats that the U.S. Post Office expects you to use when entering addresses, are established by external organizations.
No matter which data standards you are subject to, or who created the standards, the extent to which your datasets diverge from the standards is another useful measure of data quality. Data standards exist to help ensure interoperability and consistency, and lack of compliance with data standards undercuts your ability to use data wherever and whenever you need it.
Check out our eBook on 4 ways to measure data quality.