In Part 1, Roi discussed the future of Big Data management and how machine learning and natural language can be leveraged. Here in Part 2, he provides insights on re-imagining the way businesses manage data, data management challenges and how to grapple with them by building a data-driven culture.
How are you re-imagining the way businesses manage data?
The way businesses manage their data has weirdly taken a very different path than how they manage other aspects of their engineering tasks. When you consider things like owning infrastructure, managing source control, monitoring, deployments and back-ups – no one is spending time on it, because there are great services out there that simplify and automate things for them.
When it comes to data management, I think we’re still a decade behind with most businesses investing months and hundreds of thousands of dollars to build and maintain everything in-house. I think it’s a clear indication that they’re out of options. That there’s a big problem to solve here. The technology is there, but the expertise and time required to tame it is still overwhelming.
I’m sure that in the next few years we’ll see more eco-systems of Big Data services that can be instantly set up and used. You want the most advanced ML capabilities? Sure – one click of a button and you’re good to go. Big Data will become more specialized as it becomes more advanced, and external services will package all of its promise into simple and fast tools.
What are some of the biggest headaches organizations face with data analytics? What are the causes of these frustrations?
I usually like to think of it this way – if you take the most non-technical smart person you know and ask them about analytics, you’ll see that they’re able to make sense of it. They’re able to reason about data and how it’s related, and are able to come up with great ideas about how to learn something new. That’s because data, or even analytics, isn’t a unique technical challenge – it’s a logical one.
But unfortunately, the only interface we have with computers to carry out complex analytics involves a lot of coding, infrastructure and engineering around it just to get to the core of what we’re trying to do. We have to invest so much in menial tasks like data cleanup, query optimization, partitioning, archiving, etc. And once we have that, we can start exploring.
This is backwards, and in my opinion, this is the biggest challenge. I know that there are many teams out there that are fully capable of doing all of that, but you have to pick your battles and a smart team will usually focus on the domain they’re trying to conquer, rather than learning more about Big Data (unless for when the two coincide).
And every year – with more technologies and capabilities – this challenge worsens.
What should organizations be doing today to improve how they manage data long term? How can they set themselves up for future success?
Above all, I think it all starts with a data-driven culture. And having that culture involves much more than just saying “We’re data-driven,” obviously.
It’s so common to see self-proclaimed data-driven teams that rarely analyze their data or users’ behavior. Organizations needs to be honest about their priorities, and if data is a priority, they should create a concrete plan to execute on it. Even the best technology imaginable won’t help a team that’s not really interested in a deeper understanding of their market.
Secondly, I like to think that sometimes it’s better to run before you walk. If you have a ground-breaking idea of what you can learn and do with your data, you don’t have to start with all of the mediocre stuff that everyone else does. Be bold and jump ahead.
That, of course, doesn’t replace the need for basic analytics, but it does mean that, in my opinion, successful teams would be creative and non-linear in how they use their data.
What trends in data management and data analytics are you following today? Why do they interest you?
Data lakes. Well, more specifically, query engines for data lakes.
Presto, in my opinion, as great as it is, is just getting started. The vision, of having an endless ocean of data that’s easily queryable with interactive performance is slowly shaping into existence.
It’s time to put your legacy data to work for you. Plan and launch successful data lake projects with tips and tricks from industry experts – download our Building a Data Lake checklist report today!