Data infrastructure optimization software
Data integration and quality software
Data availability and security software
Cloud solutions

5 Big Trends in Machine Learning, Artificial Intelligence and Data Engineering from Strata NYC 2018 – Part 1

On the last day of the Strata Data conference in New York City last week, right after he gave his presentation, I sat down with Paco Nathan, one of the great minds in the field of AI, who has his job title on LinkedIn listed as “Evil Mad Scientist” at Derwen, Inc. Since he’s actually a pretty nice guy, I don’t think I’d agree with the “Evil” part, but the rest pretty much fits. We’ll post the full interview soon, but in many ways, it turned into a discussion of the industry trends we’d seen over the multiple decades that we’ve both been in this industry. I want to share both some of his thoughts and some of mine on the highlights of the conference, and the directions we see practical data science applications going.

One thing I wanted to mention that I wouldn’t call a trend, just the nature of something becoming more mature. Hilary Mason, GM of Machine Learning at Cloudera, said, “Data science will be mainstream when it becomes boring, no longer newsworthy. That’s what success looks like.”

I don’t think we’ve made it to boring yet, but AI and ML are certainly becoming prevalent in more and more aspects of our lives. Here are the five big trends I noticed, and some of the things that Paco Nathan and some of the other folks at Strata had to say about them:

Debugging Data - Why Data Quality Is Essential for AI and Machine Learning Success

1. Human in the Loop

Human and machine teams are better than humans or machines alone.

When you’re trying to accomplish something hard, something with a lot of uncertainty, there’s a sort of barrier that even experts hit. Getting over 95% accuracy becomes virtually impossible. You can make a machine learning model for many of those difficult, uncertain tasks, but it also will top out at about 95% accuracy. The most effective strategy is not to use a machine or a person to solve the problem, it’s to use a team with both working together.

“We can try to build better and better models, or we can throw mountains of data and train train train, but you’re fighting this diminishing returns curve. Whereas if you have human expertise handling edge cases, along with the machine learning, augmenting that. You kind of get the best of both worlds. You’re not fighting the diminishing returns. Instead, you get past that 95% accuracy barrier that we see in so many domains.” – Paco Nathan

And, since we’re talking about loops …

2. Constant Streams and Feedback Loops

Data is a constant stream, and processing is a loop with response, feedback, refinement.

The old concept of data processing was a linear, left to right, input, output kind of thing. You think, I’m going to run a job, and I’m going to get some results. But, all data is streaming by nature. When you query, you’re querying a time slice of the ongoing stream, or you’re querying the current state.

“You need to know both current state and history. Example: What is my current balance? What transactions were made on my account?” – Gwen Shapiro, Product Manager, Confluent

machine learning

(Image from Gwen Shapiro’s Strata presentation)

This trend is exactly why Syncsort just announced our true streaming Change Data Capture with Kafka and Amazon Kinesis support. No more micro-batches. Also, be sure to tune in to our webinar this week on the newest features in our data engineering software, including the advantages of true streaming change data capture as the backbone of the enterprise.

In machine learning, this circular, always on, concept is essential.

“The real work is not developing the machine learning model. The real work is once you put it into production, what you have to do to make sure that it’s right. And that’s ongoing.” – Paco Nathan

“A lot of the concept of neural networks came from jumping from a control systems concepts to biology.” Biology is messy. There are no clear answers. There’s no starting point, no input to output. “Organisms don’t work that way. They have to live all the time.”

One of the most interesting keynotes was by Amber Case, Research Fellow at MIT Media Lab and author of Designing with Sound and the upcoming Calm Technology about how we always design for analysis on a small screen, but as organisms, we’re inundated with audio data all the time. Even on an airplane flight, we can be exhausted because we can’t turn the audio off, the constant stream of input has to be processed.

“These kinds of audio analyses can focus attention on only the important aspects of something, reduce alarm fatigue.” – Amber Case

Reducing alarm fatigue can save lives by making nurses react to serious alarms faster. It also provided a helpful hint to me, that if you use noise-cancelling headphones, it can reduce travel fatigue.

A business is just like an organism. It has a lot of complex processes and inputs it has to deal with constantly.

Another fascinating keynote was about the new neural interfaces that can take directions straight from your brain. This interface can take the stream of brain waves and use it to let a completely paralyzed person walk. A side-effect of biology not being a one-way linear system is that the neural feedback from walking can actually improve the person’s physical ability to function.

machine learning

The tricky aspect of this miraculous new technology is that it’s based off patterns of our own brain waves. When our own brains are the data source, that brings up a whole new set of ethics questions, which brings me to the next, and possibly the most important, AI and ML trend, a focus on ethics, bias and privacy.

Tune in tomorrow for part 2 and for more of this year’s trends from the Strata Data conference!

Also, make sure to download our white paper on Why Data Quality Is Essential for AI and Machine Learning Success.

0 comments

Leave a Comment

Related Posts