Data democratization: finally living up to the name
Editor’s note: This article on data democratization written by Syncsort’s Harald Smith was originally published on InfoWorld.
Data democratization is the idea that digital information should be accessible and understandable to the average end user as a basis for decision-making. Data democratization has been promoted as a competitive advantage in the global economy and a desirable, egalitarian end-state where all decisions are data-driven. But has this been the reality?
In practice, I’ve found this goal of broad access has been isolated to corporations and corporate data. Most articles on data democratization quickly move from statements about accessibility to a narrower focus on organizational initiatives making data available to employees. That implies a restricted scope for data democratization, specifically not within the public domain, but limited to private sector organizations. Mostly, data “democratization” has been a buzzword for “data accessibility” with no public involvement or collaborative data use.
Until now. Today, we’re seeing broadening data democratization through the rise of user-friendly public data (such as OpenNASA, which makes all of NASA’s open data, code, and APIs publicly available, or OpenData500, tracking the use of public data; and citizen data scientists.); organizations providing content through programs like
The evolution of data access
When I began my career, data was handwritten onto paper, entered onto punch cards, and read into a computer. Programmers extracted that data, turning it into user reports.
By the mid-1980s, many business users had access to personal computers, spreadsheet software, and business reports for budgeting and forecasting. While these users and their data were typically isolated from IT, creating silos distinct from the core business systems, they knew the business processes and context for that data.
Subsequent decades brought new data storage alternatives (such as databases, data warehouses, and data marts). These included applications to capture incoming data and provide consistent business processes and specialized interfaces such as extract, transform, and load (ETL) and business intelligence (BI) tools to connect the data among applications, databases, and the spreadsheets still actively used. At times, standards emerged for certain categories of tools (such as SQL), but accessing data was the technical experts’ domain, while business users applied data for business purposes. Isolated silos remained, divided by distinct lines of business, applications, data stores, and the ubiquitous spreadsheet.
Making data democratic today
Over the last decade, these silos have been exacerbated by increasing volumes and variety of data, including sensors, social media, call data records, and data integration technologies. Organizations want to use their data, coupled with third-party data like demographics, geospatial, or even weather data, to drive business insight. But there aren’t enough experts in data storage and tools who also understand the business context around them to support organizational goals.
Enter three changes in the landscape: data science teams, citizen data scientists, and public insight into individuals’ data use. Data science teams include employees both with and without data science backgrounds who learn the business context and work with tools and data to produce analyses, models, and algorithms that previously would’ve required professional data expertise. Many companies are launching such training programs; Sears is one example, teaching 400 members of its BI operations customer segmentation tasks. By avoiding specialization, the company saved significant data-preparation costs.
Citizen data scientists may (or may not) have science degrees, but use public data to drive insights outside the corporate structure, or even to control their own data. They’ve become so prolific, Gartner predicts by 2019 they’ll surpass professionals in analysis produced. But this is an area where domain knowledge and statistical training are critical; otherwise, it’s easy to produce false correlations in an era of fake news.
Related blog post: How to Create Citizen Data Scientists
Finally, public demand is shifting control over personal data from the private sector to the individual. In Europe, GDPR gives residents more control over company use of their data with explicit statements requiring identification of what data an organization stores, demanding corrections to their data and even requiring organizations to “forget” about them.
That’s why I believe we’ve entered a new chapter where technology has caught up to the original aspirations behind data democratization:
- There are no data silos.
- Everyone can become data-literate.
- Everyone can access the tools needed to find and work with data.
- Everyone is empowered to make data-driven decisions, and the broader culture (organizational or national) embraces this empowerment.
- Everyone is responsible for data and decisions around it.
As data becomes truly democratic, there are corresponding concerns around governance, security and keeping the data fresh. These are among the issues I’ll examine as we consider how we work with and responsibly govern data.
Everything about data is changing – review The New Rules for Your Data Landscape!