Expert Interview (Part 2): James Kobielus on Reasons for Data Scientist Insomnia including Neural Network Development Challenges
In the first half of our two-part conversation with Wikibon lead analyst James Kobielus (@jameskobielus), he discussed the incredible impact of machine learning in helping organizations make better business decisions and be more productive. In today’s Part 2, he addresses what aspects of machine learning should be keeping data scientists up at night. (Hint: neural networks)
Several Challenges Involved with Developing Neural Networks
Developing these algorithms is not without its challenges, Kobielus says.
The first major challenge is finding data.
Algorithms can’t do magic unless they’ve been “trained.” And in order to train them, the algorithms require fresh data. But acquiring this training data set is a big hurdle for developers.
For eCommerce sites, this is less of a problem – they have their own data in the form of transaction histories, site visits and customer information that can be used to train the model and determine how predictive it is.
But the process of amassing those training data sets when you don’t have data is trickier – developers have to rely upon commercial data sets that they’ve purchased or open source data sets.
After getting the training data, which might come from a dozen different sources, the next challenge is aggregating it so the data can be harmonized with a common set of variables. Another challenge is having the ability to cleanse data to make sure it’s free of contradictions and inconsistencies. All this takes time and resources in the form of databases, storage, processing and data engineers. This process is expensive but essential. (For more on this, read Uniting Data Quality and Data Integration)
Third, organizations need data scientists, who are expensive resources. They need to find enough people to manage the whole process – from building to training to evaluating to governing.
“Finding the right people with the right skills, recruiting the right people is absolutely essential,” Kobielus says.
Before jumping into machine learning, organizations should also make sure it makes sense for your business strategies.
Industries like finance and marketing have made a clear case for themselves in implementing Big Data. In the case of finance, it allows them to do high-level analysis to detect things like fraud. And in marketing, for instance, CMOs, found it useful to develop algorithms that allowed them to conduct sentiment analysis on social media.
There are a lot of uses for it to be sure, Kobielus says, but there are methods for deriving insights from data that don’t involve neural networks. It’s up to the business to determine whether using neural networks is overkill for their purposes.
“It’s not the only way to skin these cats,” he says.
If you already have the tools in place, then it probably makes sense to keep using them. Or, if you find traditional tools can’t address needs like transcription or facial recognition, then it probably makes sense to go to a newer form of machine learning.
What Should Really Be Keeping Data Scientists Up at Night
While those in the tech industry might be fretting over whether AI will displace the gainfully employed or that there’s a skills deficit in the field, Kobielus has other worries related to data science.
For one, the algorithms used for machine learning and AI are really complex and they drive so many decisions and processes in our lives.
“What if something goes wrong? What if a self-driving vehicle crashes? What if the algorithm does something nefarious in your bank account? How can society mitigate the risks,” Kobielus asks.
When there’s a negative outcome, the question asked is who’s responsible. The person who wrote the algorithm? The data engineer? The business analyst who defined the features?
These are the questions that should keep data scientists, businesses, and lawyers up at night. And the answers aren’t clear-cut.
In order to start answering some of these questions, there needs to be algorithmic transparency, so that there can be algorithmic accountability.
Ultimately, everyone is responsible for the outcome.
There’s a huge legal gray area when it comes to machine learning because the models used are probabilistic and you can’t predict every single execution path for a given probabilistic application built on ML.
“There’s a limit beyond which you can anticipate the particular action of a particular algorithm at a particular time,” Kobielus says.
For algorithmic accountability, there need to be audit trails. But an audit log for any given application has the potential to be larger than all the databases on Earth. Not just that, but how would you roll it up into a coherent narrative to hand to a jury?
“Algorithmic accountability should keep people up at night,” he says.
Just as he said concerns about automation are overblown, Kobielus says it’s also unnecessary to worry that there aren’t enough skilled data scientists working today.
Data science is getting easier.
Back in the 80s, developers had to know underlying protocols like HTTP, but today nobody needs to worry about the protocol plumbing anymore. It will be the same for machine learning, Kobielus says. Increasingly, the underlying data is being abstracted away by higher-level tools that are more user friendly.
“More and more, these things can be done by average knowledge workers, and it will be executed by underlying structure,” he says.
Does Kobielus worry about the job security of data scientists then? Not really. He believes data science automation tools will allow data scientists to do less with more and hopefully to allow them to develop their skills in more challenging and creative realms.
For 5 key trends to watch for in the next 12 months, check out our new report: 2018 Big Data Trends: Liberate, Integrate & Trust