Big Data’s Back to the Future: Spark Summit East Harkens to Hadoop Conferences of Yore
Despite the excitement and innovative, cutting-edge Big Data tech that was at the core of Spark Summit East 2016, every person I spoke with at the event had the same impression – it reminded them of the Strata+Hadoop World conferences of 5 years ago. Back then, not only was the Hadoop show small enough to be held in the same location as this year’s Spark Summit East (the New York Midtown Hilton), but the orientation of the conference was toward developers, not business staff. Comments like, “You know you are at a developer’s conference when you see code up on the screen during a keynote,” typified the slant of the content and majority of the audience.
Tendü Yoğurtçu, general manager of Big Data, discusses trends and adoption challenges in a rapidly evolving Big Data market.
That said, we all know what happened with Hadoop – the technology and conferences both evolved and the last two Strata NYC events took place at the enormous Jacob Javits Convention Center. That certainly seems to be the trajectory that Apache Spark is on as well.
No one questions that the technology is white hot (pun intended) – in fact a recent Syncsort Hadoop survey found that nearly 70 percent of respondents are most interested in Apache Spark over all other compute frameworks, including the incumbent MapReduce. Databricks, the main contributor and commercial distributor of Apache Spark, provided additional adoption info in their recent Spark survey. Also, during his keynote, Databricks CTO Matei Zaharia displayed charts showing dramatic increases in Summit attendance, meet-up numbers and total Apache Spark contributors from 2014 to 2015.
So what were some of the other conference highlights?
Matei Zaharia also gave a glimpse of Spark 2.0, which will feature three key enhancements – the next phase of Project Tungsten to speed up Spark by working around Java’s memory-handling limitation, improvements to Spark’s real-time streaming system, and unifying the structured data APIs Spark uses into a single API. On real time streaming, a growing use case, he talked about the increasing importance of real-time processing, with many apps needing to combine it with batch and interactive queries. He pointed out that Spark is very well suited to do this, describing how, “Structured Streaming” working together with ETL technology can support this need.
Spark Community Edition
Databricks announced a free community edition of Spark – a free version of their cloud-based Big Data platform designed to give developers, data scientists, data engineers and other IT professionals everything they need to learn Spark, complete with a set of training resources, including a massive open online course (MOOC) and an introduction to Big Data with Apache Spark.
Apache Arrow has been accepted as a full-fledged project by the Apache Software Foundation. Arrow is designed to improve the performance and speed of Big Data components that work together as part of a larger system. The project is backed by the founders of Dremio, who are also support Apache Drill.
Great Interviews on theCUBE
Finally, SiliconANGLE TV’s theCUBE, which consistently broadcasts high quality exec interviews from top Big Data industry events, conducted 25 interviews from the Spark Summit East. One of the most watched (ok I am biased, but number of views don’t lie!) was the interview with Syncsort’s Big Data GM, Tendü Yoğurtçu who was interviewed by Wikibon Chief Analyst Dave Vallente and Big Data Analyst George Gilbert . Tendu talked about trends and adoption challenges in a rapidly evolving Big Data market and how Syncsort is addressing them. She pointed out that businesses are looking to access ALL enterprise data, including from IoT, mobile and mainframe, and require real-time insights for applications such as telecom churn analysis and fraud detection. She also addressed how many organizations are looking for a single Data Hub to access all enterprise data – necessitating skill sets and understanding of the rapidly changing Hadoop technology stack and Spark.
All in all, it was a great event – there was electricity in the air consistent with the growing excitement around Apache Spark. Conference handlers are already working on a business track for future Spark Summits. There is no doubt that Spark is an open source force to be reckoned with that is growing very fast. I look forward to looking back and remembering when Spark Summit was just like the early days of Hadoop.