Why You Should Understand All Dimensions of Data Quality
Take a second to think about your most valuable asset. What’s the first thing that comes to mind? In today’s world, it’s data. Yet, information is only valuable if it meets certain data quality standards.
There are six dimensions of data quality: completeness, consistency, uniqueness, validity, timeliness, and accuracy. Read on to learn how each of these affects data quality and what tools you can use to evaluate, improve, and monitor data quality to maximize the power of analytics.
The term “completeness” refers to how comprehensive information is. Completeness matters when you’re thinking about whether all of the information you need is available.
Incomplete information isn’t usable. One example is addresses; if you don’t have the street name, number, city, state, and zip code, your mailing won’t make it to its destination.
“Consistency” means that representations of that item match across all data stores. For example, if you have the same customer’s birthday in the same format in more than one database, that’s consistent.
There are two types of consistency: structural integrity and referential integrity. Structural integrity refers to consistency within a data source, while referential integrity applies to consistency between data sources.
“Uniqueness” means that something is one-of-a-kind. This dimension of data quality has two measures of its own: is something unique, and is it supposed to be unique?
For example, a name isn’t unique – there are many John Smiths out there. However, a customer identification number should be unique; otherwise, the wrong John Smith will receive offers or bills that aren’t meant for him.
Information is valid when it matches the rules specified for it. For example, an address is valid if it contains a house number and the name of the street. A phone number containing those same values would be invalid.
Rules for validity include the data format (number of digits), allowable types, and range (minimum and maximum values).
“Timeliness” means that information is up-to-date and available for use. Information is only considered “timely” if it fits into both of these categories.
For example, if a customer supplies you with an updated billing address, but you don’t use it when you send the next bill, that information isn’t timely.
Trillium DQ: Helping You Understand All Dimensions of Data Quality
The issue with data quality dimensions is that not all of them are important at any given time. That’s why having a tool in place to help you figure out what matters and what doesn’t is vital. Syncsort’s Trillium DQ evaluates, improves, and monitors data quality to maximize the power of analytics.
Trillium DQ is a full suite of enterprise-grade data quality tools that transforms raw data into trusted business insights. This solution allows you to discover information anywhere in your organization so you can assess its accuracy and completeness. You’ll be able to find usable data sources inside and outside of your organization and analyze metrics that compare data against other standards.
Understanding which elements of data quality and data integrity matter most helps you get more out of your data. For more information on the state of data quality, take a look at our survey.