Take aways from Strata + Hadoop World 2016 – Customer Use Cases Highlight Apache Spark and Apache Kafka Adoption!
Sessions and conversations during Strata + Hadoop World San Jose confirmed that Apache Spark and Apache Kafka are here to stay. These topics had been trending in previous conferences but this time we got plenty of evidence that the technologies have been adopted and are in production at large corporations.
As expected, the sessions were heavily focused on both Spark and Kafka, discussing everything from customer use-cases, to tuning tips.
At Syncsort, we’ve released support for Kafka sources and are about to release support for Kafka targets in a couple of weeks. Therefore, I was interested in attending as many sessions on Kafka as I could.
In the very first session I attended on Wednesday, Jay Kreps announced Kafka Streams, which could be a promising streaming processing framework for those who want to process Kafka messages without setting up a separate computing framework.
Confluent also announced their new partner program to support the rapidly growing Kafka ecosystem, with Syncsort as one of the first partners.
One of my colleagues focused on attending Spark sessions, since we are putting in the finishing touches of the native integration of DMX-h with Spark. I look forward to hearing his take-aways as well.
Another highlight of the conference was the advances in Artificial Intelligence showcased during the keynotes. The amount of data available to Machine Learning, including visual data, has enabled the development of some very cool applications. Some examples of the varied applications include the ability to find the jeans you’re looking for at a store, or to buy the perfect pair of boots by looking at pictures of boots and telling the on-line assistant which ones you prefer, controlling robotic limbs with your thoughts, andestimating the number of casualties in Syria.
It’s also great to see the commitment of the US Government to Big Data, by making its data available to help generate benefits to society and economic value. In previous conferences, we’d heard from DJ Patil, from the White House Office of Science and Technology Policy. This time, Bruce Andrews from the Department of Commerce told us about his department’s mission to make data easy to find, access, and understand.
As Jack Norris from MapR put it, the winners will not be the companies with the most data, but the ones that can combine data and processes the best. To that end, we at Syncsort are helping enterprises ingest and integrate data from a variety of sources, including Mainframe, and making it easy for them to take advantage of all the technological advances in Big Data. By running our engine natively in MapReduce and Spark, we allow enterprises to develop their data pipelines in a simple point-and-click GUI, which is independent of computing frameworks, and thus future-proof. DMX-h customers can choose whether to run their data pipelines on premise, on-the cloud, in MapReduce, or in Spark. By supporting streaming sources, we allow customers to consume or produce Kafka messages without having to write any code. And our newly added DataFunnel enables the ingestion of hundreds of tables in one-shot.
In her appearance at SiliconANGLE TV’s theCUBE, Syncsort’s General Manager for Big Data, Tendü Yoğurtçu, shared her thoughts on how Syncsort is helping our customers excel at combining their data and processes: