Stream processing and real-time analytics are the driving trends behind the fast acceptance and general popularity of Kafka.
Another big data product for the Hadoop ecosystem, born at LinkedIn and nurtured to adolescence by the open source community at Apache, Kafka works much like the organization’s “central nervous system,” as it was described in Computerworld Magazine. It collects large quantities of data in real time as it streams in via user interactions, logs, application metrics, IoT devices, stock tickers, etc., and delivers it as a real-time data stream ready for use.
LinkedIn has made something of a habit of birthing uber-good big data processing and streaming data products and then letting the Apache open-source community adopt and raise them. Kafka is one of these adopted kids.
Kafka Joins the Hadoop Ecosystem by Way of LinkedIn
Kafka isn’t the first Hadoop-related, open-source brainchild to spring forth from the innovators at LinkedIn. These are also the ones responsible for conceiving other projects that became Apache Incubator graduates, including Samza, Helix, and Voldemort (no, not him).
According to Stephen O’Grady, cofounder and analyst for RedMonk, Kafka is sometimes likened to ActiveMQ and RabbitMQ when it comes to what it does for on-premises data infrastructures, or perhaps with AWS Kinesis in the cloud.
Kafka is Becoming Mainstream within the Data Streaming/Real Time Community
Kafka is no longer relegated to the back closets of big data and the Hadoop ecosystem, either. It’s gone big time. According to a recent poll conducted by Kafka’s unofficial godparents at Confluent, Inc. in Palo Alto, Kafka is already working hard in the enterprise, primarily serving in use cases including:
- Application monitor – 60% of users
- Data warehouse – 51% of users
- Asynchronous application – 47% of users
- System monitor – 39% of users
- Recommendation engine – 35% of users
- Security monitor and/or fraud detector – 26% of users
- IoT applications – 20% of users
- Dynamic pricing – 12% of users
Another survey of Kafka community users indicates that 68 percent are getting ready to take on even more stream processing within the next year, and about 65 percent of their users are preparing to hire workers with Kafka skills within the next year.
According to Brian Hopkins, VP and analyst with Forrester Research, if you aren’t getting down with Kafka, you’re already getting behind the 8 ball. “Up to 2014 it was all about Hadoop, then it was Spark,” Hopkins told Computerworld. “Now, it’s Hadoop, Spark, and Kafka. These are three equal peers in the data-ingestion pipeline in this modern analytic architecture.”
Voldemort is database; a distributed key-value storage system. Not a Harry Potter villain. Gotta hand it to Apache, though, it’s another example of their extraordinarily cool product naming system.
Getting Kafka Plugged into Your Enterprise Data Infrastructure
Of course, like most of the Hadoop ecosystem, Kafka doesn’t just waltz in and start singing and dancing with your existing infrastructure. It takes some plugging in and tuning first.
That’s where Syncsort comes in. The latest release of Syncsort DMX-h combines Kafka streaming with the rest of your enterprise batch data sources to greatly simplify integration of data streaming in Hadoop, Spark and Kafka.
For more information, check out the webcast, “Simplifying Big Data Integration with Syncsort DMX and DMX-h”