The Gartner Data and Analytics Summit this week opened with a number of dichotomies which our decision-makers face, including an abundance of data that we don’t always trust. Improving our data literacy may be the key to reversing this trend.
Data Abundance vs. Scarcity of Trust
One of the dichotomies that particularly resonated was that of Scarcity vs. Abundance. For instance, even though we are now in a world of abundant data, we continue to see a scarcity of trust. Some recent studies have shown that greater than 60% percent of executives are not very confident in their data and analytics insights.
— Paige Roberts (@RobertsPaige) March 9, 2017
What results instead is another dichotomy: Confusion instead of Clarity. As I attended sessions covering a wide range of topics on data usage and analytics, I considered whether there was an opportunity to improve trust and quality in data while reducing confusion.
We have an exponential growth in data, computing power, and access, yet one analyst noted we only have a linear growth in the ability to use or consume that data. This creates a gap where we get overwhelmed by “facts” and instead fall back on “gut feel” or “sub-conscious” decision-making.
In her keynote, entrepreneur and author Margaret Heffernan noted that while we may have all the data, many times we find that no one will listen; that data alone will not drive change, particularly when it runs against the established, prevalent model of thought. This is at the heart of the trust issue: established models attract confirming data and repel disconfirming data.
The Need for Better Data Literacy
Yet at the same time, as Sam Esmail, creator and writer of the TV series Mr. Robot, noted that we can’t forget the part that data does not contain: human intelligence. Humans provide context, perspective, and solutions. There is a need then to come back to and invest in “data literacy” and help understand whether we are asking the right questions of data – the “why?” and “who?” rather than just “what?” and “how?”
— Philip On (@OnPhilip) March 7, 2017
What does this mean for understanding the quality of abundant data in our data lakes, a fundamental component to ensure not only trust in data but also validity in analytics and analytical models? Margaret Heffernan commented that “sometimes the data not going into the model is what counts” and that “anomalies are always interesting.”
Traditionally, our industry has had a very black or white view of data quality: it’s good or it’s bad. Data must conform to a number of dimensions such as completeness and validity if it is to be considered good data, and if it is bad then it’s an error that must be resolved or removed. That view may fit for tightly controlled operational processes, but if we’ve thrown out the anomalies, how can we truly test our data or find new business insights? Simply put, we can’t.
We need to step back and understand that “data literacy” requires us to ask and understand the business problem first (the “why?”), to understand what different users need (the “who?”), and then to understand the questions we may need to ask of data. As Heffernan observed, “one of the greatest uses of data is to provide disconfirmation of mental models.” We can’t do “data quality” for the sake of achieving “data quality.”
Instead, we need to provide a platform to bring in the range of data that may be relevant, including an understanding of its original context. Then, we let the data tell us what it can, see what it shows us even if it doesn’t fit the mental model of “good” data, and finally establish what quality data means to that business problem and the models, algorithms, and analytics we build to address it. (All the while bearing in mind that the quality data requirements of one problem may be completely different to those of another problem).
This approach changes how, what, and where we establish “data quality” in the data lake. Data quality shifts away from being a gatekeeper or filter to the data lake, to becoming a core part of the toolset to understand, explore, and refine the data that has arrived for the different users to take advantage of.
Data literacy then, is about providing users and consumers with the scientific approach towards data (including data quality) that allows them to frame questions in ways that help establish clarity rather than confusion; to prove or disprove established models; to generate new models, analyses, and reports that are supported by data; and to achieve understanding and insights that move all of us towards a greater abundance of trust.
Check out these Common Data Quality Use Cases to learn more about how to build trust in data by ensuring the most accurate, verified and complete information.