Integrating Data Quality into your Data Governance Strategy
How do you enforce data quality best practices? A good place to start is your data governance policy, which should be designed with data quality goals front-and-center.
To understand how data quality and data governance fit together, let’s start with some basic definitions.
Data quality is the ability of a given set of data to serve its intended purpose. If you lack data quality, it means you have data, but can’t use it to achieve your goals because the data is inconsistent, contains errors, cannot be translated into the format you need or suffers another major problem.
Data governance refers to the set of rules and procedures that you put in place to control how data in your organization is collected, stored, processed and managed.
Bringing Data Quality Best Practices in Data Governance Policy
This blog has already covered how good data quality practices can reinforce adherence to data governance policies. The relationship between data quality and data governance goes further than that, however.
Your data governance policy itself should be designed with the goals of data quality in mind. In practice, that means building policies like the following into your data governance framework:
Avoidance of manual data entry
Manual data entry is much more likely to introduce errors into a dataset that is machine-based data collection. For this reason, your data governance policy should prohibit manual data entry whenever an automated solution can be used instead.
Preference for open standards
Data that is stored in formats based on open standards, as opposed to proprietary databases, is generally easier to translate into other formats or move. Your data governance rules should require the use of open standards wherever possible to minimize the likelihood of ending up with a data set that you cannot use because of formatting or translation issues.
Strict data access control
The more people you have modifying a data set, the harder it is to keep data formats consistent. Inconsistent data is low-quality data. This is one reason (security is another) why you should allow only those people who need to have write access to data to have it.
Poor documentation, or failure to adhere to documented policies, clouds visibility into data, which in turn undercuts data quality. For this reason, your data governance should require anyone who works with data to adhere to documented procedures when possible. In situations where that is not possible, the procedure used to collect or analyze data should be documented clearly so that anyone else will be able to understand it if necessary. Relatedly, code that is used in data management should be written with comments explaining what it does.
Most of your organization’s employees don’t work with data. But they should still be educated about basic data management best practices. Your governance policy should require this education with the goal of turning everyone in your organization into a citizen data scientist. This will, in turn, help to ensure that data quality best practices are followed even in situations where the people working with data are not themselves data management experts.
In all of these ways, your data governance policies can reinforce data quality best practices, thereby helping to maximize your ability to derive value from the data you collect.
Check out our eBook on 4 ways to measure data quality.