During the Enterprise Data World conference last week, it was clear that many organizations are wrestling with the rapid changes in information management and governance necessitated, and many are assessing where they are in this process, even questioning “where to start?”
Understanding Big Data Context to Find Relevant Data
William McKnight of McKnight Associates noted importantly in his opening keynote: “don’t talk yourself out of starting.” Stan Christaens of Collibra added that organizations are finding different points to get underway, whether facilitating self-service analytics, enabling data stewards to care for data, working on critical compliance requirements, or freeing data scientists to find relevant data. But, as Mike Nicosia of TIAA commented, while maturity assessments may provide insight, “without context, you cannot make good decisions.”
This is a challenge for data-driven businesses as they endeavor to get actionable insights from critical enterprise data assets, leveraging next-generation Big Data environments.
My opening day was filled with tutorials on Data Modeling wrapped around my own presentation “Finding Quality in the Data Lake”. In the morning, I heard about advanced, but traditional techniques for modeling the enterprise data warehouse. That afternoon, I learned about the challenges of modeling for NoSQL databases.
What struck me in comparing the two was context – that is, the understood context of a given piece of data. In the first, the originating context is stripped away to get to a model of an entity – a computerized representation through data of some real-world object. In the second, the context is maintained through the use of techniques such as document stores or graphed relationships. As the instructor in the latter tutorial noted, “context is critical.”
As I’ve recently reflected on the meaning of data quality in the emerging structure of the Data Lake, the notion of context for Big Data takes on primary importance. Nicosia used the analogy of a cholesterol test. If you’ve had the test and the doctor says you are at 250, what does that mean? Is it good, is it bad?
You need context – context that includes a definition of what the data is, how it’s recorded, whether it has a scale of measurement, and even whether there is a prior value or measurement for comparison.
A Question of Big Data Context: How do we find relevant data for “John Doe”?
However, Big Data context is not simply a reflection of what data means. As Andrew Patricio, former CDO of the District of Columbia Public Schools commented “What problem are you trying to solve?” There needs to be a focus on “relevant data.”
Theresa DelVecchio Dys, Director of Social Policy Research and Analysis at Feeding America noted that their Data First Initiative started with a problem statement. As she noted, “not all data is good for all things.”
Data Quality Helps Target Fit For Purpose Data
For Feeding America, who coordinate a nationwide network of food banks serving over 46 million people each year, quality data is critical, and yet at the same time, their programs must focus on service and the operational processes to support it. The context of how and where data can be effectively and efficiently gathered is a key factor – too much focus on exactness in data collection upfront can lead to long lines which results in those they service turning away, the exact opposite of their intent! Patricio reiterated this point when he noted that “a goal of effectiveness instead of quality goes towards the solution.”
With an understanding that we, as part of organizations, are trying to solve problems, we can focus on asking key questions, testing hypotheses, and evaluating outcomes. These are activities that must be supported by data, in context, and allow us to make determinations as to what data is fit for purpose.
Laura Sebastian-Coleman, Data Quality Center of Excellence Lead for Cigna, noted specifically that data quality depends on:
- Fitness for Purpose – how well the data meets the expectations of consumers (always with some constraints)
- Representational Effectiveness – how consistent the data is to the defined or modeled concepts
- Data Knowledge – how well consumers understand and can decode the data
Without this knowledge, which depends on the context of the data, our Data Lakes or even our Data Warehouses are doomed to become “Data Graveyards.”
4 Steps for Achieving Trust in Your Data: 1) Know your goal or at least form an hypothesis, 2) Understand your data by measuring data quality, 3) Determine if it’s relevant data, ie. “fit for purpose” and 4) Document and validate your results
We make assumptions and take risks as we build out these data repositories. We assume that consumers understand what problems they are trying to solve. Sebastian-Coleman reminded us that we assume that consumers will:
- Recognize the contextual bias of the data (a point that I comment on in my prior blog post: Abundant data, scarcity of trust, and the need for data literacy)
- Know how to merge and reconcile data without prior knowledge
- Understand the incomplete nature of data sets
How CDOs Help Big Data Consumers Make Big Decisions
In the closing keynote featuring a panel of Chief Data Officers, these CDO’s emphasized the need to understand the language of the business and the criticality of communication and transparency. This knowledge is key to helping data consumers make informed decisions with Big Data context. As McKnight commented in kicking off the conference, “top performers realize they need data, that they are in the business of data” regardless of their industry, and that “it takes knowledge and focus to get it right, not just more time and budget.”
There are a lot of starting points, a lot of pathways, in managing information in this rapidly changing data landscape. As McKnight said, “beyond the mountain is another mountain,” and Patricio reflected that this is a “continuous cycle of processing and evaluation.”
Our data lakes will not be static; cannot afford to become data graveyards. But keeping them from becoming so requires us to continually reflect on the business problems we are trying to solve, to ask questions of the data, to understand the context of the data, and to measure and evaluate the fitness of the data for our purposes. With Big Data context in mind, we can mature our organizations and make more effective data-driven business decisions.
For more information on how to improve data quality in your customer database, watch our recent webinar Getting Closer to Your Customers with Trillium Precise.