The first DataWorks Summit in San Jose was full of exciting announcements to help fulfill Hortonworks’ vision of enabling the Modern Data Architecture. Here’s a look at the conference highlights.
Key Notes from Hortonworks
During DataWorks Summit’s first day of keynotes Rob Bearden, CEO of Hortonworks, announced the following joint efforts with IBM:
- Hortonworks to adopt Data Science Experience platform from IBM
- Hortonworks to adopt Big SQL
- IBM and Hortonworks collaboration to advance Apache Atlas as enterprise governance
- IBM has standardized on Hortonworks Data Platform
— Paige Roberts (@RobertsPaige) June 13, 2017
Rob also shared some estimates of market size:
- Cloud computing: $380 Billion
- Internet of Things: $1300 Billion
- Streaming Data: $210 Billion
- Data Science, Machine Learning: $19 Billion
Hortonworks is focusing on data governance and security with Apache Atlas and Apache Ranger. It is also working on improving the user experience by making it easier to shift workloads to the Cloud. Some of the sessions showcased upcoming improvements in Apache Ambari and the Streaming Analytics Manager (SAM), which is now in technical preview.
Customer Success on Display
The sessions also showed great customer stories illustrating the transformation Big Data brought to businesses such as Walgreens, Target, Symantec, Comcast, Progressive Insurance, among others.
Progressive had a session discussing their journey to the data lake. Big Data architect Krishna Potluri told the story of how their developers initially had a lot of fun using open source utilities and writing custom code to ingest data from Mainframe and legacy databases into Hadoop. But, once it was time to transfer the code to another team, a lot of time was spent dealing with issues and supporting the process. They decided it was time to invest in a tool.
During one of the DataWorks sessions, Krishna Potluri takes us on Progressive’s “Journey to the Data Lake”
The Progressive team needed a tool that was enterprise-ready, ran natively on Hadoop, could be used by non-programmers, had support for all their data sources, was cloud enabled, integrated with source management systems like GIT or TFS, had a tight partnership with Hortonworks, and offered ‘phenomenal’ technical support. After testing different alternatives, they settled on Syncsort DMX-h.
Krishna reported that data ingestion has gone from days to hours, and there is a single entry point, so the development time is constant no matter how many tables need to be ingested into the data lake.
Hortonworks & Syncsort Discuss Partnership Milestones
Syncsort CTO, Tendü Yoğurtçu, PhD and Hortonworks CTO, Scott Gnau made a joint appearance on theCUBE to provide updates on the partnership between the two companies, which helps customers unlock data from legacy systems and bring it into the data lake.
Hortonworks’ Scott Gnau and Syncsort’s Tendü Yoğurtçu talk to theCUBE live from DataWorks Summit 2017
Both companies are very focused on data governance. DMX-h captures the metadata of the data ingestion and publishes it to Atlas. Syncsort has also announced a partnership with Collibra.
Tendü also discussed Syncsort’s new efforts to keep the data fresh by offering alternative Change Data Capture capabilities to fit different use cases of legacy data that is also used in the Data Lake.
Beyond Early Days, Hadoop is Formidable Industry Player
An informal poll during Thursday’s keynotes by John Kreisa, VP of International Marketing at Hortonworks, showed that the majority of the audience has been using Hadoop for 1 to 3 years.
The focus on governance, security, and user experience greatly advances the maturity of the platform. It comes as a response to feedback from existing customers on how to remove blockers and enable enterprises of all sizes to take advantage of the transformative powers of Big Data in their businesses.
For tips for successful integration of mainframe data in Hadoop, read the TDWI checklist report Building a Data Lake with Legacy Data