What Defines Data Quality, Anyway?
In a general sense, data quality is the ability of a data set to meet an intended goal. But what does that mean, specifically? Which factors go into achieving data quality? Let’s explore.
If you want to read more about the basics of data quality and the reasons why it’s important, check out this article, What is Data Quality? Explaining What Data Quality Actually Means.
But if you already understand the core ideas behind data quality, and seek strategies for achieving an effective data quality strategy, keep reading. The following are the main factors whose presence or absence from a data set determines whether or not you have data quality.
When you’re collecting data from a disparate set of sources, it’s easy to introduce inaccuracies into your data sets.
For example, OCR tools may make mistakes when reading analog information. Employees may make typos when manually entering information. Application errors could cause data to be written in the wrong format.
These types of problems can lead to entries within your data set that are just wrong. A customer’s name might be misspelled, for example, or a time and date may be recorded inaccurately.
It’s easy to imagine how data inaccuracies can harm data quality. If the data set on which you are basing decisions contains information that is wrong, and you don’t know it, you’ll fall short of achieving your goals.
Perhaps the next-worse thing to inaccurate information is missing information. If some entries in your database are left blank, you may not be able to pursue your goals fully.
You can’t send a customized email to a customer if you know his email address but don’t know his name, for example. (Well, you could try, but no one likes messages that start with “Dear [customer name]…” or are similarly impersonal.)
Data that is accurate and complete one day may cease to be so the next. Customers move and change mailing addresses. They get new jobs and new email addresses. They change their names.
And those are just the types of outdated data problems you can suffer when it comes to people. Other types of data sources can become outdated, too. A server’s IP address might change, for example, or a hardware update to a network switch might change the amount of traffic that it can handle.
Outdated information is essentially inaccurate information. It causes serious data quality challenges if it’s not corrected.
Copying data is easy. Avoiding the unintended consequences of having multiple copies of the same data is harder.
Consider, for example, what might happen if a database contains two identical entries for the same customer. A software application that uses this database to record payment information might modify only one of the entries when the customer pays a bill. If a different application were then to check the database to find out whether the customer has paid, it might find the other entry for the same customer, which will say that he has not paid. I don’t think I need to elaborate on why this is bad.
True, your software should be smart enough to control for problems like these. But dealing with data quality problems is not the job of software developers alone. Data engineers bear the primary responsibility for controlling data quality.
Inconsistent Data Formats
In a perfect world, all data would be stored in the same formats, all the time.
In the real world, the possibilities are virtually infinite when it comes to the types of formats in which data could exist. Different software applications, operating systems, database platforms and more all tend to format data in different ways. Sometimes, even different versions of the same tool may format data differently.
Sometimes, formatting differences aren’t a big deal. It’s unlikely that anyone has ever died because of the differences in the way that Unix and Windows format line endings in text files, for instance.
But when you’re dealing with complex data sets, formatting inconsistencies can hamper your ability to interpret data at the speed you need. If your analytics platform expects data to exist in one format, but your data sources store it in another, and you don’t have tools to transform the data, you’re in a poor position to achieve your goals.
Download our new eBook today and discover how you can create and implement a data quality strategy that will support your business initiatives and easily scale to meet future needs.