Does Your Data Measure Up? How to Assess Data Quality
Businesses today are increasingly dependent on an ever-growing flood of information. Whether it’s sales records, financial and accounting data, or sensitive customer information, the accuracy and adequacy of a company’s data is critical. If portions of that information are inaccurate or incomplete, the effect on the organization can range from embarrassing to catastrophic.
That’s why you, as an IT professional, should be committed to ensuring that the information your company relies on meets the highest data quality standards.
Measuring Data Quality
The term “data quality” refers to the suitability of data to serve its intended purpose. So, measuring data quality involves performing data quality assessments to determine the degree to which your data adequately supports the business needs of the company.
A data quality assessment is done by measuring particular features of the data to see if they meet defined standards. Each such feature is called a “data quality dimension,” and is rated according to a relevant metric that provides an objective assessment of quality.
The industry hasn’t yet settled on a standard set of data quality dimensions, but the following is a representative group:
Completeness, Validity, Timeliness, Consistency, Integrity
Let’s take a brief look at each of these and at the metrics used in assessing them.
Completeness relates to whether all required information is present in the data set. For example, if the customer information in a database is required to include both first and last names, any record in which the first name or last name field is not populated is marked as incomplete. The metric used in assessing this dimension is the percentage of records that are complete.
Data is characterized as valid if it matches the rules specified for it. Those rules typically include specifications such as format (number of digits, etc), allowable types (integer, floating point, string, etc), and range (minimum and maximum values). For example, a telephone number field that contains the string ‘1809 Oak Street’ is not valid. The metric for this dimension is the percentage of records in which all values are valid.
Timeliness relates to whether information is up-to-date for the intended use. In other words, is the correct information available when needed? For example, if a customer has notified the company of an address change, but the new address is not in the database at the time billing statements are processed, that entry fails the timeliness test. The metric used to measure timeliness is the time difference between when data is needed and when it is available.
A data item is consistent if all representations of that item across data stores match. If, for example, a birth date is entered in one system using the U.S. mm/dd/yyyy format, but it is imported into another system where the date is entered using the European dd/mm/yyyy standard, that data lacks consistency. A paper published in the April 2002 edition of Communications Of the ACM, defines the metric for consistency as “the ratio of violations of a specific consistency type to the total number of consistency checks subtracted from one.”
When critical linkages between data elements are missing, that data is said to lack integrity. An example would be a Sales Transactions table in which the customer ID points to a record in the Customers table. If a customer record is deleted without updating related tables, records in the Sales Transaction table that point to that particular customer become “orphans” because their parent record no longer exists. This represents a loss of referential integrity. An appropriate metric for data integrity would be the number of orphan records present in a database.
How To Start
If you’ve never done a data quality assessment before, it can look a bit daunting. But it needn’t be. Sophisticated automated data quality solutions, such as those provided by Syncsort, can make the process straightforward.
Check out our eBook on 4 ways to measure data quality.