4 Data Quality Trends I Observed During Recent Data Governance Events
Over the past five weeks, I’ve attended and presented at four different events, including the Strata Data, Data Governance Financial Services, and Information Management MDM & Data Governance conferences in the New York metro area and ASG EVOLVE in Washington, DC.
What’s clear to me… data governance is still at various forms of maturity across organizations, but also is important as ever, if not more so.
Data Quality in Data Governance
Why is data quality such an important part of data governance programs? Consider this quote from the recent report from Harvard Business Review, “only 3% of the Data Quality scores in our study can be rated ‘acceptable’ using the loosest-possible standard.” That’s not only telling of the importance of data quality, but it’s scary!
My presentation at these conferences focused on the intersection of data quality and data governance. I spoke about different aspects of both, including data profiling (understanding of what you have), data matching to create a single view, assigning data stewards and definitions, defining policies including data quality rules, and so on.
Data lineage is all about understanding where data came from and how it was changed along the way.
There were lots of nods from all of the audiences when I talked about data lineage. You can’t just understand where the data came from a source perspective (this database table, that mainframe file), think about if it was joined with other data sources, aggregated, transformed or something else along the way. You need to have data lineage at the field or column level.
This is a hard problem. We’re doing this now with our data integration tool Connect for Big Data, at the field level, publishing an API to get to the lineage and integrating with tools such as Cloudera Navigator and Apache Atlas.
Key Data Elements
Let’s get back to data quality. During the presentation, I talked about identifying key data elements, assigning data stewards, and creating data quality policies on these data elements to create data quality dashboards.
Key data elements are not in the tens of thousands, in fact, in one presentation I saw this month said they originally identified over 21,000 then reduced it by 85%! Comparatively, most organizations we work with have 100-200, maybe even less, to start with.
Combining Business and IT Data Knowledge
The people that “know the data” are in the business, not necessarily in IT. In my presentation, I gave an example from a Proof of Concept we did a few years ago.
We were doing the POC with IT, and while profiling the data we came across a state field that was MH. This is obviously not a US state, but when IT went to the business, MH was used for a person that worked in Massachusetts and lives in New Hampshire. When I mentioned this during my presentations and asked how many people here have data like this, there were a lot of “yes’s” and head nods.
The lesson here: the data stewards must be in the business.
4 Data Quality Trends Happening Now
I talked with a lot of people at these conferences. What I learned:
- Everyone confirmed data quality is a key part of data governance. In fact, you can’t have good governance without data quality
- Companies are at various maturity with regards to measuring and improving the quality of the data
- Some companies are already measuring their data quality, while many are just getting started
- “Boiling the ocean” doesn’t work. Many said their organizations tried this and always failed. Start small, operationalize it, learn, show success, then move on or expand.
Think about how to get started within your organization. We obviously can help.
For more information, download our eBook: Fueling Enterprise Data Governance with Data Quality