In Part 1 of this two-part interview, Satyen Sangani (@satyx), CEO and co-founder of Alation, spoke about data cataloging. In today’s Part 2, he provides his thoughts on trends and best practices in Big Data management.
What are some of the more outdated or inefficient processes involved with accessing relevant data today? What is slowing businesses down?
It can take a long time to extract data from the data lake and get the right data to the right person exactly when they need it.
Businesses are moving to self-service analytics solutions where it isn’t necessary to have the involvement of the IT department to access and work with data. However, self-service tools often fail at helping users understand how to appropriately use the data. Specifically, they don’t always know which data sets to use, which definitions to use or which metrics are correct.
What should companies be doing today to prepare for how they’ll use data in the future? What should their long-term strategies look like?
Ultimately, you want to get data, business context and technical context in front of your employees as quickly as possible. The days where you could take months to prepare a report are over.
Given this, companies need to spend time thinking a.) how they can get data to their employees as fast as possible and b.) how to train their workforces to find, understand and use that data to get insights fast.
What’s one piece of advice you find yourself repeating to your clients over and over? Something you wish more companies were doing to get more out of their data?
Data governance has traditionally implied a top down, command and control oriented approach. Such an approach generally works when compliance is the primary goal, but when the goal is to get data consumers to use data more often, it’s important to take an iterative and agile approach to data governance.
It’s less about prescribing rules than reacting to users by gently correcting and improving their behavior.
What trends or innovations in Big Data management are you following today? Why do they excite you?
Self-service is, of course, a big one. We also like distributed computation engines like Presto and Spark. The notion that we can disconnect compute from storage is finally becoming a reality.
AI and Machine Learning need to be embedded into every layer of the stack. There’s too much manual work in data and that manual work comes at the cost of speed.
To learn how to put your legacy data to work for you, and plan and launch successful data lake projects with tips and tricks from industry experts, download the TDWI Checklist Report: Building a Data Lake with Legacy Data