At a recent Confluent partner event, Syncsort’s Paige Roberts sat down with Confluent Co-Founder CTO Neha Narkhede (@nehanarkhede) to discuss Apache KafkaTM, Confluent, the future of streaming data processing, and Women in Big Data.
In this first part, Neha explains the origins of Kafka and Confluent as a company, the trends that lead to its founding, and the advantages the platform brings to the streaming data world.
Paige: So tell me who you are, and tell me about yourself.
Neha: My name is Neha Narkhede. I’m currently the Co-founder and CTO at Confluent. Prior to Confluent, I built Apache Kafka with two other people at LinkedIn. We open sourced Apache Kafka, and it became very popular, adopted across thousands of companies.
I thought that if there was going to be a company around Apache Kafka, then we should be the ones who created it. So I suggested it to my colleagues, and the three of us ended up creating Confluent two years ago. Two years down the line, I’m really enjoying building the company – every single aspect of it. I specifically run technology, engineering, customer operations and professional services.
That’s great. Very cool. Who are the other two people who founded Confluent with you?
Jun Rao and Jay Kreps. They created Apache Kafka with me and are also co-founders of Confluent.
I’ve been seeing a large uptake of Kafka. It seems like Kafka is becoming the central nervous system for almost every kind of company.
There you go. You picked up the term!
I did. I wrote a post on it for your blog recently using that analogy. Why do you think it took off so much? What is it about Kafka that appeals to everybody?
You know, there are several reasons. We saw a trend about six years ago that was growing in the industry that companies are becoming a lot more digital. That means, data is not just created by human actions, it’s also created by machines. As a result of that, data is orders of magnitude larger than ever before, and legacy systems are falling apart.
Also, the typical way for businesses to know what’s going on was through batch processing once a day at midnight. That’s too slow, now. At LinkedIn, we were responsible for LinkedIn’s data systems and we said, “Wow. We should actually know how LinkedIn is doing – how users are accessing the website in real time.” So we looked at everything in the space that was available, and there wasn’t really a good solution for it.
It just turned out that this is an industry-wide trend. People want visibility into the health of their business. They want to know new market opportunities, new cost savings or opportunities – and they want to do that in real time. So we open-sourced Apache Kafka.
Really, I think the rise of Apache Kafka has been driven by the fact that it solves an industry-wide problem, and it does that as an open source technology. So, it’s very easy to adopt and get started.
I think the third reason for Kafka’s popularity is that it was put to very large-scale use at LinkedIn. Before we released it, it went through a sufficient amount of testing and operationalization, which resulted in the user experience being very good.
It was already solid.
It was already solid, yes. I think even when we created the company two years ago, the Kafka experience was already pretty solid. All of those factors – that it’s open source, that it just works and that it solves a real problem companies are facing – contribute to Kafka’s success today.
There’s also a last one which is sort of surprising to me. Since Confluent was created, there’s been a marked increase in terms of Kafka adoption. I think people are betting on it because there’s an enterprise supporting it. I find that interesting.
The technology enabled the corporation and the corporation is enabling the technology.
Yeah, it’s like a good network effect, right?
For us, or at least for me, creating the company is sort of the vehicle to the end mission. And the end mission is to be able to put this Kafka-based streaming platform in the heart of every company.
Ah! That’s ambitious.
And really, the way to do that is to create a company around it. It’s very much like the mission required creating a company, and that is why we’re doing that.
A great part of my job is to learn about new tech. I read the book authored by Confluent Co-founder Jay Kreps. He came to Data Day Texas, and I attended his lecture. I’ve spent time with customers using Kafka. I think I’ve got a pretty good handle on Kafka.
Tell me about the Confluent platform as far as what it brings to the table in addition to Kafka.
There are two things. First, there’s Confluent Open Source which is a 100% open source distribution of Apache Kafka. And then there’s Confluent Enterprise, which is Confluent Open Source with proprietary value-add features.
So, if you think of Apache Kafka as a powerful engine, you realize what you really need is a car. The Confluent platform is the car.
Confluent Open Source is the set of tools and software that you need to use alongside Kafka to really succeed with it in a company. Confluent Enterprise provides users the out-of-the-box capability to connect it with existing systems, use different kinds of clients and connectors, and to secure and monitor Kafka.
One of our recent big announcements is Confluent Enterprise. We just took the feature set, and made it a lot more operational, more rock solid, especially for companies who are serious about using Kafka in production. They need things like multi-data center capability, much more significant monitoring capability, and a lot of operational smarts. That goes in the enterprise version.
In part 2 of this conversation with Neha Narkhede, we’ll look at the Schema Registry – what it is, why it’s needed, and where it fits in the Kafka ecosystem. Ms. Narkhede will also talk about the big announcement Confluent made earlier this year.