Data Integrity vs. Data Quality: How Are They Different?
Data is incredibly valuable, but that doesn’t mean it’s always an asset. When companies work with data that is substandard for any reason, it delivers incorrect insights, skewed analysis, and reckless recommendations.
Two terms describe the condition of data: Data integrity and data quality. These two terms are often used interchangeably, but there are important distinctions. Any company working to maximize the utility and value of data needs to understand the difference:
This is the broader of the two terms. Data can have integrity but not have quality. To understand why, consider what defines data quality:
- Completeness – The data present is a large percentage of the total amount of data needed.
- Uniqueness – Unique data sets are free of redundant or extraneous entries.
- Validity – Data conforms to the syntax and structure defined by the business requirements.
- Timeliness – Data is up to date or relevant to the timelines referenced by the business requirements.
- Accuracy – Data accurately describes the real-world context it refers to.
- Consistency – Data is consistent throughout the dataset.
Quality data must meet all these criteria. If it is lacking in just one way, it could compromise any data-driven initiative.
Data quality, overall, really refers to whether data is useful. Data integrity, by contrast, refers to whether data is trustworthy. Obviously, it must be trustworthy to be useful, but just because it is trustworthy does not mean it’s also useful. Data integrity is judged based on these variables:
- Physical Integrity – Utilizing data is difficult when there are problems either storing or retrieving it. It raises questions about the data’s completeness, accuracy, and validity. The physical integrity of data hardware can become compromised due to age, malfunction, maintenance problems, natural disasters, or power outages.
- Logical Integrity – Data becomes illogical when it is incorrect or irrational in some way. Data has to “make sense” for its context. Otherwise, it distorts the perspective of anyone basing decisions off the data. Logical problems can happen because of design flaws, human errors, or software bugs.
A Simple Example About Quality and Integrity
In 1993, the United States Postal Service discovered that 23% of all mail is incorrectly addressed. That figure is surprising considering that addresses are fairly simple pieces of data that are easy to store, reference, or remember. Widespread address mistakes are a good example of how easily data can suffer from quality or integrity problems.
If the street number in an address if off by just one digit, the mail would not arrive at the right house. This is an example of poor data integrity because the data itself is invalid and untrustworthy. Now think about a person who has moved but not updated their address. The mail would arrive at the right house but not to the right recipient. This is an example of poor data quality because the data lacks timeliness, accuracy, and consistency.
Data integrity and quality issues are inevitable, but they are not unresolvable. Companies that make a proactive effort to fix existing data issues and prevent future ones see better outcomes from all their data-driven initiatives. To explore this issue further, read our eBook: 4 Ways to Measure Data Quality.