An Engineer’s Perspective on Strata + Hadoop World 2015
This year’s conference hosted by O’Reilly and Cloudera was even more exciting and informative than usual, in my opinion.
There was a lot of discussion and announcements around security; from Eddie Garcia’s keynote proposal of a secure-by-default standard, to an interesting lunch discussion I had with the founder of BlueTalon, (one of the winners of this year’s Startup Showcase) about their security offering. All this focus on security signals that Hadoop as a platform is now mature enough to be used in production environments ̶ even in traditional enterprises with sensitive data.
The Open Data Platform announcement was the focus of a few of the keynotes and a lot of interesting online discussions, and lively lunch conversations with folks from Hortonworks and Pivotal. It will be interesting to see how this offering is received by our customers and throughout the Apache Hadoop ecosystem.
But the most exciting sessions for me were about developments in Apache Spark, and about what’s happening in the ‘real-time’ analytics space.
Tathagata Das from Databricks put it well when he said in his Thursday session that “the key is to blend batch and real-time”. I heard this theme in a lot of discussions. The ability to combine batch processes with real-time data promises the ability to detect issues and opportunities more quickly. As Anil Gadre from MapR put it in his Keynote, data to insight is no longer enough: “data to action is key.” The technology required is still maturing, but Apache Kafka seems to be emerging as the winner in the distributed messaging system space, and Netflix announced their decision to replace their home-grown suro with Kafka. The streaming engine discussion doesn’t seem to be settled yet, and Jim Scott from MapR had a great session comparing Apache Storm, Samza, and Spark Streaming.
As for the Hadoop execution engine space, we hosted a very lively ‘Birds of a Feather’ lunch discussion on Apache MapReduce, Tez, and Spark. There was a lot of excitement around how fast Spark is being developed and how far it has come in 2014. Both Netflix and Tencent had great sessions sharing their experiences with Spark. Spark is still not mature enough to be widely adopted in production, but the community seems very engaged in surfacing issues such as fault-tolerance, multi-tenancy, and scalability so they can be addressed. The message for me here is that Syncsort’s Intelligent Execution Layer will be very valuable. Thanks to this technology our customers and prospects can minimize risk, facilitate Hadoop adoption, and future-proof their applications. DMX-h gives them the freedom to leverage new processing frameworks without the need to rewrite their applications, saving them from having to acquire constantly changing skill sets.
And finally I’d like to share some exciting Syncsort appearances. Syncsort and Impetus were doing live demos of the integration of DMX-h with StreamAnalytiX in both booths. Mike Brown showed in his session how Syncsort fits into the architecture of comScore’s validated Campaign Essentials solution. And our General Manager for Big Data, Tendu Yogurtcu, was interviewed on SiliconANGLE’s live streamed Internet show theCUBE, providing insights on the open source, Hadoop adoption and acquisitions.