At this year’s Strata Data Conference in New York City, Syncsort’s Paige Roberts sat down with John Myers (@johnlmyers44) of Enterprise Management Associates to discuss what he sees in the evolving Big Data landscape. In the first part of their discussion, Myers pointed out a shift away from technology and toward business value and some advantages of in-memory processing for machine learning.
In today’s conversation Myers touches on dealing with cultural pushback against machine learning applications, how to get machines and people working together to take advantage of the strengths of each, and Moneyball.
Paige Roberts: A few years back, I saw a use case where in-memory processing was a recommendation engine for people who were buying cars at auctions. And it was like, this is the amount you should pay. This much, not any more than this.
John Myers: Right.
It was a brilliant, machine learning kind of capability, and no one used it. Because it was a black box.
The brand-new guys who knew nothing, they used it. Anybody who knew anything wouldn’t use it because he wanted to trust his own judgment. When they made it work was when they added three or four reasons. Like, “this is why.” And then the new guy was all over it, which he already was. But even the experts suddenly got better margins because he could use his own knowledge, plus the machine knowledge. It was awesome.
I think that that gets us kind of back to… I’m a big movie and sports guys. So, I’ll talk about Moneyball.
In Moneyball, the Oakland A’s baseball team leverages statistics (vs. traditional human analysis) in an attempt to improve their season record.
One of the problems of the original part of the Moneyball concept in baseball was that this laptop is going to replace me, a guy who’s been watching baseball for 20 years. I know what a baseball player looks like. And they would get a little…
They would get that cultural pushback. I think the best way to use things like money ball, things like recommendations, is to say, “Hey, I’m not going to tell you which one to use.” I’m going to use this to augment your job and say, like you said, “Give you some reasons, we’ll give you some choices. Hey, here’s top five choices here. You pick any…”
Give him knowledge.
Right. “These five met our base criteria. Go ahead and pick anything you want.”
That person now feels empowered. They feel like they have another tool in their toolbox. Not thinking “a computer program is trying to replace me.”
A computer program is never going to be as good as a person in some things. And there are other things where a person is never going to be as good as a computer program. Where you really go to the next level is when you can get them working together. That’s really the exciting part.
Well, this is another thing I’m interested in here at the show is machine learning. Now, there’s a lot of people who have kind of gone hook, line and sinker over into machine learning. There are things that machine learning can do where if you try to stick a person in the middle, you’ll slow down the process. To the point where you’re not going to be able to make it. But how do you build the models for your machine learning? How do you validate those models?
You’ve got to have a person.
I think that somewhere along the line it makes sense to insert a person in the process. Once you’ve validated a model, once you’ve validated an algorithm and said, these are the parameters that we want you to work inside of, let it go run and have fun with the machine learning and…
You must go back and revalidate, though, on a fairly regular basis. You can’t just let it run because then you end up…
I’m with you. But I think that we’re still in those early stages.
Someone I talked to on the plane coming down was like, “Yeah, there’ll be a time where we won’t have to send people into factories to evaluate quality.” And I was like, “If you understand that nothing changes.” But having that person who has that imagination to go, “Hey, this would be a way for us to improve.” Or, “This would be a new way for us to do that.”
Or we’re doing something new. I don’t have any past data to base this on.
But I have years of experience in this area. So, I know that if you do this, this and this, your quality will be better.
Exactly. But there are people who have swallowed hook, line and sinker in the whole machine learning, that they can have those adoptive algorithms that can do those types of things. But I think you’re right.
From a cultural perspective, let’s say I’m the guy who runs the plant. I’m not going to say no from a cultural perspective. I’m going to say, “If something goes wrong, it’s on me.” And I want to have that validation. Trust but verify.
I need to know. Don’t just tell me, “Oh, the algorithm told me so!” That’s not going to fly with my CEO when he says, “Why did that break?”
But, if you have those visionary people who say, “Yeah, I want the change. Let me trust and verify what it does,” then they can be more comfortable with it as it goes on. They can have that competitive advantage of that disruption to say, “I’m going to take the sensor data from my manufacturing plant, and I’m going to use it to say this is going to be my quality.”
And if I get to a point where I can’t really improve my quality, then yes, now you can start to pull the people out and assign them to new roles, and have them find the next piece. Because in any process you’re going to run into the 5 9s, or 6 sigma, or whatever.
In quality, there’s a limit. Yeah.
And say, “Okay, I’ve hit that. Let’s figure out the next one that will allow us to be great.”
I think that’s one of the great things about machine learning, and some of the things that we talk about with Big Data are: What are the things that you hate dealing with in your day? And let’s have Hadoop or machine learning or whatever the technology is…
Take care of that for you.
Take it away and say, “You focus on something else.”
Now, culturally, we always run into the people who go, “So it’s going to replace me?” or, “Where is my job?” And like I said, I like the discussion, “What do you hate most? Do you hate moving files? Do you hate doing that base discovery?”
When I teach classes, a lot of times they’ll say documentation. I’ll ask, “Does anybody like documentation?” I’ve got 50 people in the room, and nobody raises their hand…
I started out … [Laughter] writing documentation.
I’ve done documentation, too. But when we document a new system, that’s fine. When we’re documenting an old system, it’s like…
Oh, I’ve got to update the six words in paragraph 25 on page 32. Really? [Laughter]
Exactly. We used to give that to the new guy. It’s kind of like technical hazing. You get to do the documentation.
Good luck with that. [Chuckles]
There’s no better way to teach somebody. But it requires a lot of time, and it’s manual. If we’ve got the metadata, if we’ve got that data and we say, “just, print a report.” Now we’ve got our up-to-date documentation instead of traipsing through whatever directory structure… because I used to be a UNIX SysAdmin, LS minus, AL pipe grep, blah, blah, blah, blah, blah.
Instead of doing that, they’re reading a well-formatted document that tells them the exact format of the system. We’ve taken that away, and now that person can take that data, take that documentation and do something else, not be crawling through whatever directories all day.
And even if they’re a documentation person, you’re not replacing them. This is just documenting all those nitpicky little things that used to drive you crazy. You can work on the high-level stuff that explains the whole system – the more interesting part of the job.
Yeah. So, I think those are some great opportunities that impact the business.
Now, we’ve talked a bit about documentation or technology or whatever. But if we raise that up and say, “Hey, this is the current status of our customers.” We can give that to customer care. Or, “We have our prospects.” We can give that to sales and marketing.
Yeah. I have to know my customers. What is the tedious part of getting to know who this customer is: Where do they live, how much stuff have they bought over the last ten years?
One popular area of adaptation for machine learning is mainframe operations. Read Syncsort’s eBook Mainframe Meets Machine Learning to learn about the most difficult challenges and issues facing mainframes today, and how the benefits of machine learning could help alleviate some of these issues.
Be sure to read Part 3, where John and Paige will discuss the 80/20 rule of data science that most data scientists spend 80% of their time getting data ready for analysis, rather than doing what they do best.