Expert Interview: Doug Cutting, Cloudera Chief Architect and Hadoop Co-Founder: Part One
Every year, around Strata + Hadoop World, Cloudera hosts the Data Impact Awards, an award ceremony to congratulate customers for their most impressive or impactful implementations. Partners are invited to nominate customers, and industry analysts and experts judge the nominees. Everyone gets together to hear the stories of how Hadoop has changed businesses and lives and applaud the efforts of dedicated developers. During the celebration, Syncsort’s Paige Roberts dragged Cloudera’s Doug Cutting off to a quiet spot to chat.
Syncsort’s own Paige Roberts sits down one-on-one for a candid discussion with open-source guru and Hadoop creator, Doug Cutting of Cloudera.
Paige: You’re fairly famous now, but how did you get started in all this?
Doug: I spent a number of years in the software business in Silicon Valley. First, I was working in research at Xerox PARC, then at Apple, and a company called Excite in the 90’s. So I was always building search engines. I had worked on search technologies for a long time. I was an experienced software developer who was also working on problems that involved lots of data, trying to build scalable solutions that weren’t amenable to using a relational database. So that was my technical background.
In the late 90’s I wrote a search engine on my own time, in Java, called Lucene. Then, in 2000, I released it as open source. At that point, I learned the ability of open source to make a technology into a standard. Lucene really took off. It was good technology but also it was this delivery method of open source, this building a community at Apache, that really was predominantly responsible for its…
Doug: … dominating success.
Doug: A couple years later, trying to build a distributed version of Lucene that can crawl the internet, we came across Google’s papers about MapReduce and distributed files systems. We realize that these are the right techniques. No open source version of them exists.
This would be a useful technology: great potential for another open source project, a general utility that a lot of people could share if it were available as open source. So, Mike Cafarella and I got together and worked on that for a few years. By 2005, we had something up and running. We managed to rope Yahoo into devoting a big team to getting it to be scalable and really fulfill this promise of delivering an open source solution for scalable computing which we called Hadoop.
Paige: I just interviewed Owen O’Malley recently.
Doug: Right. He was a key part of that team that took what Mike and I had written, and got it to the point where it could really be used by anyone. The reason I was the guy who was able to get this is, I had a combination of technical experience with building scalable systems that weren’t relational databases, as well as experience with open source. I recognized that the combination would be really useful for a lot of things.
I didn’t realize just how useful it would be. That took the founders at Cloudera. I was not a founder. They really thought this would be useful to lots of other companies, like those we’re seeing here today.
Paige: At the awards?
Doug: At the awards. In banking, in transportation, and agriculture – all these crazy sectors are now finding these technologies useful. The founders of Cloudera were the first people to realize that there was a great opportunity for that. They started the company in I guess it was ’08, and I joined in ’09. It’s really been phenomenal to see this growth.
Paige: It really would have been hard to predict that explosion of growth. It’s not something you could have seen coming.
Doug: In retrospect, I think that what is now called the digital transformation that almost every industry is going through, was predictable. There were people predicting that.
As you know, Moore’s Law gives us cheaper and cheaper hardware. People are using it in more and more places, and it’s a byproduct to get data. That data can improve your business because you can use it to understand what you’re doing, and then you can optimize how you are doing things and improve the quality. It’s a really great, great thing to have. And I think you could have seen that, that data was going to become such a key asset to businesses across industries.
Put that together with these Big Data tools. The existing enterprise software universe wasn’t going to satisfy those needs, for a variety of reasons. For one reason, the hardware and software were way too expensive and too specialized for specific tasks, which weren’t the tasks that people needed to solve. People needed lower-cost solutions. They needed things that were more scalable and more general purpose. So it was really the right time for these technologies.
I think all the evidence was there. I simply didn’t put it together. But I think someone could have.
Paige: Well now you have some perspective on the trends from inside Cloudera and from your history. Where do you think it’s going from here?
Doug: I think we’re really seeing that most of the growth in industry is coming from these technologies. So I think we’re still in the early stages of industries becoming data intensive. I think we’re seeing this really driving growth and improvement and optimization and – what’s the word the economists use? Productivity.
We’re seeing some real advances in productivity. In ways, it was predicted that we see, from competition, improvements in productivity, a long time ago. Then some people were sometimes disappointed and say, “Oh. The paperless office is only slightly more productive.” I think people didn’t realize all the places that technology currently is used and touches. And we’re still only learning that. So that’s predominantly what we’re going to see.
And I think this open source ecosystem is really the appropriate way to build the technology. We don’t know what exact tools people are going to need. We need people to experiment – people at universities and in companies – to try building something that they think they need and see if other people need it. Then we’ve got this ecosystem that we can evolve the right tools. If several institutions find it useful, then it, you know…
Paige: Takes off?
Doug: Takes off and becomes a standard. We’re certainly seeing this again and again – rapid evolution for improving software that matches the needs of industries. And that’s pretty cool. [laughter]
Paige: Yes, it is!
Doug: So, in predicting where that’s going to lead to, what industries are going to be huge or what new technologies are going to drive them. I don’t think anybody can predict that. I think maybe with hindsight you can say, “Oh yeah this should have been obvious.” Like they’re doing about this thing.
Paige: [Laughter] Yup, crystal balls that look backwards are really clear.
Doug: But, I do think those are the trends that are going to be driving things: this generation of data at scale, and then the use of it to improve productivity.
Read Part 2 of this interview series where Doug discusses what he is working on now, the launch of Apache Spot, and how to help organizations stay on track with open source, both on-premise and in the cloud.