To measure data quality – and track the effectiveness of data quality improvement efforts – you need, well, data. Keep reading for a look at the types of data and metrics that organizations can use to measure data quality.
Data quality refers to the ability of a set of data to serve an intended purpose. Low-quality data cannot be used effectively to do the thing with it that you wish to do.
Whichever path you adopt for improving the quality of your data, however, you want to be sure that you have a way to measure the effectiveness of your efforts. Otherwise, you’ll be investing time and money in a data quality strategy that may or may not be paying off.
7 Metrics to Measure Data Quality
What does data quality assessment look like in practice? Following are examples of metrics that typically help a company to measure data quality efforts.
1. The ratio of data to errors
This is the most obvious type of data quality metric. It allows you to track how the number of known errors – such as missing, incomplete or redundant entries – within a data set corresponds to the size of the data set. If you find fewer errors while the size of your data stays the same or grows, you know that your data quality is improving.
2. Number of empty values
Empty values – which usually indicate that information was missing or recorded in the wrong field — within a data set are an easy way to track this type of data quality problem. You can quantify how many empty fields you have within a data set, then monitor how the number changes over time.
3. Data transformation error rates
Problems with data transformation – that is, the process of taking data that is stored in one format and converting it to a different format – are often a sign of data quality problems. By measuring the number of data transformation operations that fail (or take unacceptably long to complete) you can gain insight into the overall quality of your data.
4. Amounts of dark data
Dark data is data that can’t be used effectively, often because of data quality problems. The more dark data you have, the more data quality problems you probably have.
5. Email bounce rates
If you’re running a marketing campaign, poor data quality is one of the most common causes of email bounces. They happen because errors, missing data or outdated data cause you to send emails to the wrong addresses. (Related: Using Data Quality as a Service to Create the “Golden Record” – Part 1: Email Validation)
6. Data storage costs
Are your data storage costs rising while the amount of data that you actually use stays the same? This is another possible sign of data quality issues. If you are storing data without using it, it could be because the data has quality problems. If, conversely, your storage costs decline while your data operations stay the same or grow, you’re likely improving the data quality front.
7. Data time-to-value
Calculating how long it takes your team to derive results from a given data set is another way to measure data quality. While a number of factors (such as how automated your data transformation tools are) affect data time-to-value, data quality problems are one common hiccup that slows efforts to derive valuable information from data.
The metrics that make the most sense for you to measure will depend upon the specific needs of your organization, of course. These are just guidelines for measuring data quality.
The most important thing is to have some kind of data quality assessment plan in place, whatever its details may be.
Hear how data quality will play a major role in Big Data this year – watch Syncsort’s on-demand webinar, 2018 Big Data Trends: Liberate, Integrate, and Trust Your Data.