How to Measure Data Quality – 7 Metrics to Assess the Quality of Your Data
To measure data quality – and track the effectiveness of data quality improvement efforts – you need, well, data. Keep reading for a look at the types of data and metrics that organizations can use to measure data quality.
Data quality refers to the ability of a set of data to serve an intended purpose. Low-quality data cannot be used effectively to do the thing with it that you wish to do.
Whichever path you adopt for improving the quality of your data, however, you want to be sure that you have a way to measure the effectiveness of your efforts. Otherwise, you’ll be investing time and money in a data quality strategy that may or may not be paying off.
7 Metrics to Measure Data Quality
What does data quality assessment look like in practice? Following are examples of metrics that typically help a company to measure data quality efforts.
|Metric||Definition||How to Calculate|
|Ratio of Data to Errors||How many errors do you have relative to the size of your data set?||Divide the total number of errors by the total number of items.|
|Number of Empty Values||Empty values indicate information is missing from a data set.||Count the number of fields that are empty within a data set.|
|Data Transformation Error Rates||How many errors arise as you convert information into a different format?||How often does data fail to convert successfully?|
|Amounts of Dark Data||How much information is unusable due to data quality problems?||Look at how much of your data has data quality problems.|
|Email Bounce Rates||What percentage of recipients didn’t receive your email because it went to the wrong address?||Divide the total number of emails that bounced by the total number of emails sent, then multiply by 100.|
|Data Storage Costs||How much does it cost to store your data?||What is your data storage provider charging you to store information?|
|Data Time-to-Value||How long does it take for your firm to get value from its information?||Decide what “value” means to your firm, then measure how long it takes to achieve that value.|
1. The ratio of data to errors
This is the most obvious type of data quality metric. It allows you to track how the number of known errors – such as missing, incomplete or redundant entries – within a data set corresponds to the size of the data set. If you find fewer errors while the size of your data stays the same or grows, you know that your data quality is improving.
2. Number of empty values
Empty values – which usually indicate that information was missing or recorded in the wrong field — within a data set are an easy way to track this type of data quality problem. You can quantify how many empty fields you have within a data set, then monitor how the number changes over time.
3. Data transformation error rates
Problems with data transformation – that is, the process of taking data that is stored in one format and converting it to a different format – are often a sign of data quality problems. By measuring the number of data transformation operations that fail (or take unacceptably long to complete) you can gain insight into the overall quality of your data.
4. Amounts of dark data
Dark data is data that can’t be used effectively, often because of data quality problems. The more dark data you have, the more data quality problems you probably have.
5. Email bounce rates
If you’re running a marketing campaign, poor data quality is one of the most common causes of email bounces. They happen because errors, missing data or outdated data cause you to send emails to the wrong addresses. (Related: Using Data Quality as a Service to Create the “Golden Record” – Part 1: Email Validation)
6. Data storage costs
Are your data storage costs rising while the amount of data that you actually use stays the same? This is another possible sign of data quality issues. If you are storing data without using it, it could be because the data has quality problems. If, conversely, your storage costs decline while your data operations stay the same or grow, you’re likely improving the data quality front.
7. Data time-to-value
Calculating how long it takes your team to derive results from a given data set is another way to measure data quality. While a number of factors (such as how automated your data transformation tools are) affect data time-to-value, data quality problems are one common hiccup that slows efforts to derive valuable information from data.
The metrics that make the most sense for you to measure will depend upon the specific needs of your organization, of course. These are just guidelines for measuring data quality.
The most important thing is to have some kind of data quality assessment plan in place, whatever its details may be.
For a deeper dive into data quality measurement, read our eBook 4 Ways to Meaure Data Quality
Syncsort offers data quality products that support data governance and compliance initiatives and produce a complete, single and trusted view of your data.