5 Big Trends in Machine Learning, Artificial Intelligence and Data Engineering from Strata NYC 2018 – Part 2
Here are the five big trends I noticed, and some of the things that Paco Nathan and some of the other folks at Strata had to say about them. Last time, we covered trends 1 and 2 which focused on machine learning and the constant stream of data. This time we’re covering 3 and 4.
If you missed part 1, you can find it here.
3. Ethics, Bias and Privacy
Just because you can, doesn’t mean you should.
Nicola Askham, the Data Governance Coach, and I recently did a webinar entitled “AI Without Data Governance is Unethical.” As AI becomes more ubiquitous in our lives, and even, maybe a little bit boring, we realize how important it is for the machines we created to treat people fairly. And the more ML and AI are put into practice, the more we see how easy it is for a machine to perpetuate the unfairness of the past.
As Clare Gollnick, CTO of Terbium Labs, pointed out in the Women in Big Data lunch panel – “100% of data is about the past. Unless you want your machine learning model to persist the biases of the past, you need a values-based approach.”
Every dataset is biased in some way. You have to assume that before you even start an ML project. Ethics has to be built into the ground floor of the data engineering, and it also has to be in the feedback loop. Ethics has to be fundamental, but that human refinement of what is right and fair also needs to be on the deployment side to keep AI from adversely affecting people. Find the edge cases and make sure that the results are not biased by testing the results, and refining them.
Privacy becomes key. The direct EEG-based brain interface is great, until someone decides they can arrest you because you have the brain patterns associated with people who are about to commit a crime. Also, the only way they can build this wonderful interface that lets you think commands to control a device, is if the designers have access to large datasets of people’s brain patterns.
“Using brain-based data, of low quality, and possibly normed on biased datasets. Our laws don’t give us much protection against this. – Amanda Pustilnik, Professor of Law at University of Maryland
The current applications of data analysis in our legal system don’t exactly provide a warm feeling of trust that our government will treat us fairly. Pulitzer Prize-winning Wall Street Journal technology reporter, Julia Angwin, gave a particularly moving keynote presentation about the uneven application of punishment and other racially biased practices, justified now by machine learning.
In her many years of reporting tech industry scams, she noticed that only two people were sent to prison due to something she wrote. Both of them were black men. White men had committed similar crimes, and she had exposed them in similar articles, but none of them went to jail.
Bias is often found as a form of cutting a break, or offering forgiveness, but only for some. That bias is as clear and enraging as a slap in the face when you see the recidivism analyses that have sent so many to jail for longer and longer sentences, but only if they’re not white.
“Skewed algorithms for recidivism give ridiculously high breaks to white people. It’s forgiveness through data, but only for some.” – Julia Angwin
While this is one of the most heinously unfair examples of machine learning abuse, don’t think that if you’re not in the criminal justice system, bias is something you don’t have to worry about. This happens every day, in regular business analyses, all too often in subtle and maybe almost boring ways.
“Bias can often look like a break for one group that another group doesn’t get. People in white neighborhoods get charged lower insurance rates, even when their risk calculations are the same.” – Julia Angwin
While the rest of these trends are to make machine learning and artificial intelligence faster, more efficient, more effective, it is on all of our shoulders as data engineers, data scientists, and all the professions surrounding these fields to make damn sure we’re not more efficiently discriminating, or enforcing bias at a faster pace.
4. GPU’s and Custom Hardware
Even faster, more efficient data crunching.
We’re starting to see a lot of changes in the hardware space to accommodate the demands of AI and ML projects. GPU’s are faster than general purpose CPU’s at linear algebra. There were a lot more vendors out on the show floor this year taking advantage of GPU’s with their products. Tensorflow, one of the most popular machine learning frameworks, has both a CPU and a GPU implementation, and many argue that is why it’s become so popular.
But software being written to take advantage of existing hardware advantages like GPU’s isn’t the only hardware-related trend. We’re seeing more and more customized hardware now, and I suspect we will see even more of it in years to come.
ASICs are a big thing now. For folks who aren’t familiar, here’s a definition of ASICs from Wikipedia:
“An Application-Specific Integrated Circuit (ASIC), is an integrated circuit (IC) customized for a particular use, rather than intended for general-purpose use. For example, a chip designed to run a digital voice recorder or a high-efficiency Bitcoin miner is an ASIC.”
“Now, they’re really coming out with ASICs that can do more advanced linear algebra at enough scale that you don’t have to go across the network. That’s the game.” “Hardware is moving faster than software and software is moving faster than processes.” – Paco Nathan
That is how it always seems to happen. The hardware improves, the software eventually improves to take advantage of the new hardware, and eventually the business processes get to the point where they take advantage of the new software.
The irony to me is that a lot of the Hadoop and big data processing revolution came about by making it possible to use inexpensive commodity hardware to crunch massive datasets, making processing at that scale affordable. Now, we’re seeing hardware customized and optimized for specific machine learning tasks.
“We’re moving towards a streaming world, and I think we’ll see much more math intensive, much more bizarre hardware. Brace yourselves.” – Paco Nathan
The new way of computing is coming whether you’re ready for it or not.
Ziya Ma, VP of Software and Services Group at Intel, announced another advance, Persistent Memory, memory with the size and persistence of storage.
Check back on Monday for the final trend from this year’s Strata Data conference.
Also, make sure to download our white paper on Why Data Quality Is Essential for AI and Machine Learning Success.