Strata + Hadoop World 2017 in San Jose, CA was packed with fresh content. There have been so many advances in Big Data beyond Hadoop that the organizers have decided to rename the conference to Strata Data Conference. Themes that emerged from this year’s conference include a renaissance of machine learning, Big Data in the cloud and purifying Hadoop data lakes.
Machine Learning Renaissance
The biggest theme of this last edition of Strata + Hadoop World was Machine Learning. Mike Olson, co-founder and Chief Strategy Officer at Cloudera called it a “Machine Learning Renaissance”, brought on by higher data volumes and lower compute cost. Vijay Narayan from Microsoft gave the example of the cost of human DNA sequencing, which went from $10M in 2007 to less than $1K today.
The examples of problems being solved by Machine Learning were varied. From Coursera’s challenge to matching 25 million learners to 2 thousand course offerings, to improving the diagnosis of diseases with faster DNA sequencing, to detecting fraud in financial transactions, to vehicle automation, to combating child exploitation.
In a very compelling and entertaining keynote session, Second Spectrum’s CEO Rajiv Maheswaran showed off his company’s sports data embedded into videos of NBA games, helping teams gain new insights and creating a futuristic viewing experience for the audience. Machine Learning is also helping advance language translation, disaster response, news reporting, and gene editing techniques.
Big Data in the Cloud
Some of the Strata + Hadoop World’s sessions also showcased advances in the usage of Cloud for Big Data. Cloudera’s Jennifer Wu and Andrei Savu presented some of the work being done by the Cloudera Director team and the open source community to address issues of security, performance, and consistency for data engineering workloads on platforms such as AWS and Azure.
Google’s Slava Chernyak had a session on Apache Beam and Cloud Dataflow that introduced the audience to the concept of watermark in stream processing.
— syncsort (@Syncsort) March 22, 2017
In his appearance on SiliconANGLE TV’s theCUBE during the week of Strata + Hadoop World, Syncsort CEO Josh Rogers talk about Syncsort’s view, as a power player in the data quality and integration environment, including how machine learning can be used in raising the bar for data quality. He also addresses the growth in cloud adoption, where more and more organizations are implementing hybrid cloud and on-premise environments.
Flink Enhanced, Ray Debuts
On the distributed computing engine front, Jamie Grier from data Artisans talked about some of the enhancements added to Flink 1.2 to enhance event-driven processing, and Mike Jordan from UC Berkeley unveiled Ray, which aims to replace Spark for Machine Learning workloads and adopts a TreeReduce instead of MapReduce paradigm.
This golden age of Machine Learning is all about data. Helping enterprises create a data lake populated with clean data is our focus at Syncsort. Syncsort’s battle-tested, enterprise-ready, and community-recommended software liberates data from legacy Mainframe and Data Warehouse platforms. With our recent acquisition of Trillium Software, we will bring market leader Data Quality and profiling capabilities to help ensure high quality Big Data is available in the data lake.
Purifying the Hadoop Data Lake
As Syncsort’s General Manager for Big Data, Tendü Yoğurtçu, PhD discussed in her appearance at theCube there is a hidden cost of data integration related to skill sets and the rapidly changing Big Data technology stack. Syncsort recently announced support for Spark 2.0 and we continue to learn and work with our partners to enable enterprises to design their Data Flow once and be able to deploy it on any distributed computing engine, on-premise or on any of the cloud platforms.
In her appearance on theCUBE, Syncsort Big Data GM, Tendü Yoğurtçu, PhD talks about industry trends and themes at Strata + Hadoop World, and lends her insights on enterprise readiness in the cloud and touched on how Syncsort’s competitors are playing catch-up with its design once, deploy anywhere approach.