Expert Interview (Pt 1): Josh Poduska, Chief Data Scientist at Domino Data Lab on Making Production Data Science More Achievable
At the recent Strata Data event in NYC, Paige Roberts of Syncsort had a chance to sit down and speak with Josh Poduska, the Chief Data Science Officer at Domino Data Lab. In part one of this two part series, Roberts and Poduska discuss data science maturity levels and how to make data science more achievable and practical.
Roberts: Let’s start with introducing yourself. Can you tell our readers a little about yourself?
Poduska: I’m Josh Poduska, Chief Data Science Officer with Domino Data Lab. I’ve been in the data science industry for about 17 years. I’ve been doing it since before data science was a thing, and seen a lot of changes. I’m really excited about what’s ahead for the discipline.
So for those who are unfamiliar with Domino Data Lab, can you give us an idea of what you guys do?
Sure. Domino has built the data science platform that basically helps everyone run their business on models. We recently created a new category that we call “model management” It emphasizes the principles of being model-driven, and of orchestrating your work loads and your business around models and data as essential assets, to enable you to reap the promise of data science. Data science, machine learning, and AI have a lot of hype, but for too many organizations, the real promise behind that is unrealized. So, Domino is helping facilitate the realization of that return on investment.
Making it a little more practical, more achievable by the average company?
Do you specialize in any particular industries, or across the board?
Across the board. Some verticals that we’ve had early successes in are those that have regulatory requirements. So, financial services, some of the life sciences. Anywhere there’s a need to reproduce your scientific results, to track what you’re doing, to work as a collaborative team, which is really every industry. But industries that are thinking about that more than others right now are financial services and life sciences. So, those have been some early wins of ours. We’re also deep in the insurance industry, and some of our newer customers are in retail. Allstate is a customer, GAP’s a customer. Bristol-Myers Squibb, S&P Global, so, really it runs the gamut.
But the heavily-regulated ones are the ones that are first in line?
First in line, right. They’re being asked right now to justify the investment in data science. For example, you built a fraud model that saves the business a billion dollars a year. Justify why that model is not biased. Justify the lineage of that model. Where did it come from, from inception to production? How are you going to monitor that model in the future? These are all things that Domino has built systems in the platform to help with. So, the regulated industries have been drawn to us.
Since I’ve gotten to know more about Domino in the time that I’ve been here, I see a lot of benefit really for smaller organizations too. What Domino helps you do is it helps you establish best practices for doing data science, and that best practice muscle memory and way of doing things is so important to the younger companies.
If the younger companies can do data science the right way, they’re going to break out of that young company phase, and really start to see the benefits that are the reason why they got into that business in the first place. For them, it’s life and death. For the more mature companies, it’s just incremental gains. But because we help manage the data science process, it’s an easier fit for companies with data science teams of 5, 10, 20 or more data scientists. But like I said, I still see strong value saving for the younger companies, too. It’s kind of fun to work with them because you see so much growth and difference being made so quickly.
You just did your presentation. That went really well. Can you talk about a couple of the big concepts?
I think the big concept is that data science has now gotten to a maturity level as a discipline where we need to stop treating it like a backroom-tech project, and start incorporating it into the organization. What our tutorial was about was making models central to your organizational capability. There’s so much that goes into that. Some of the highlights are: Having your data science team really integrate and listen to the business and know how to work with the business, and having a strategy for putting projects into production. Having a pre-flight checklist. Having a plan for how you deliver models. Talking to the consumers of models before you even start the process of building them. Understanding how it’s going to be consumed to ensure that it’s going to satisfy the need.
Then along the way, safeguarding for model liability, model bias, safeguarding your model so that it doesn’t turn into shelfware. There’s a lot that goes into it. Because we’re at that maturity level, it’s pretty exciting to see organizations move to the next step, and say, “All right. We’ve hired some data scientists. We have data. Now, how can we integrate this with the business?” How can we integrate this with IT, with sales, with marketing, with software engineering? Data science is becoming a first class organization within companies. The companies that are doing that like Amazon, like GAP, like Allstate, they’re the ones that are really starting to lead the way in their industries.
I know there’s a lot of challenge between the step where you have a business use case, a trained model, data sets, and then getting all that into production. I know a lot of data science projects get stopped there. Do you help with that kind of leap?
We do. In the tutorial, we made the point that the biggest barrier to making data science an organizational capability is process and culture. But technology can help enable that process and culture change to happen.
We’re trying to create a virtuous cycle and help enable that virtuous cycle between the science of creating the models, and the reality of putting them into production. And then, creating a feedback loop where you’re monitoring those models in production and connecting the dots.
Refining the model. Putting them back in. It’s like, “Okay. This is working really well, but how about this?” What if we add something else?
Make sure to check back for part two where Roberts and Poduska talk about data lineage and data bias.
- Josh’s Strata presentation, Managing Data Science in the Enterprise
- White paper: Putting All Your Data to Work: Why Legacy and Traditional Data Is a Goldmine for AI and Analytics
- White Paper: Debugging Data: Why Data Quality Is Essential for AI and Machine Learning Success