Best Practices for Creating Data Quality Rules
In my prevous blog post, I shared four data quality trends I observed at recent industry events that are essential to data governance. In today’s post, I take a deeper dive into my conference presentation, reviewing best practices for creating data quality rules that help organizations to improve data governance and support business intelligence.
The 7 Dimensions of Data Quality Rules
When creating data quality rules, consider the data quality dimensions you want to create. There can be up to seven dimensions, but you might just want to start with 3 or 4.
- Definition: Software and activities related to storing, retrieving or acting on stored data
- Monitor & Measure: How steward and users have governed access to the data they need and be able to provide governed access
- Definition: Correct data values and real-world state. Transformation from origin, understanding provenance and lineage
- Monitor & Measure: How data changes over the time and compares to industry or organizational standards
- Definition: The Extent to which expected attributes of data are complete, as required by the data consumer or organization. (Data can be complete, but not accurate) Does it cover external or IoT (sensors)? weather records? call data records? etc.
- Monitor & Measure: Measure comprehensiveness of data and metadata. The dimension does not need to be 100% complete but must match expectations and policies
- Definition: Does the data match across data values and data sets. Gartner describes as “consistency of data across proximate data points.” E.g., if Chicago and Louisville are 30° and 32°, then it’s unlikely that the temperature in Indianapolis is 70°
- Monitor & Measure: Uniformity of quality metrics across the organization
- Definition: Closeness between data consumer needs and data provider to use data with maximum efficiency. Does it cover the relevant geographies?
- Monitor & Measure: Data fit for purpose; percentage of all data required divided by all data provided
- See also: Big Data Context: Targeting Relevant Data that’s Fit for Purpose
- Definition: Extent the data is sufficient, up to date, and available when required
- Monitor & Measure: Data availability compared to consumer time requirement
- Definition: Conforms to defined definitions and policies/rules; a value could be valid but still inaccurate
- Monitor & Measure: Compare data and policy (format, type, range)
Data Quality Policies and Implementation
Once you determine which dimensions are important, the data steward creates the data quality policies. Our customers document these policies in our partner products such as ASG Enterprise Data Intelligence and Collibra Data Governance Center.
These then need to be implemented. This is where data quality tools like Trillium Discovery come into play.
The results can then be rendered in Discovery.
We’ve also built integration with our partner products and are also publishing API’s for our customers to customize this integration with our partners or other tools in their ecosystem. This integration then allows the results to be in their product products and data quality dashboards.
For more information, read our eBook: Fueling Enterprise Data Governance with Data Quality