During DataWorks 2017, held this past week in San Jose, Syncsort CTO Tendü Yoğurtçu and Hortonworks CTO Scott Gnau reunited for a joint interview on theCUBE, SiliconANGLE TV’s live broadcast, with co-hosts Lisa Martin & George Gilbert.
The is not the first time Tendu and Scott joined forces. They appeared together at this same event last year to launch the EDW Offload solution. In this week’s interview, they discuss what’s happening with the partnership, including new developments that add value to the joint solution. Here’s a full recap of the interview:
Syncsort + Hortonworks Partnership: Value, Strength and New Technology
Scott is very bullish on Hortonworks’ partnership with Syncsort, pointing out that it is built on the foundation of accelerating joint customers time to value and leveraging our mutual strengths. He states that we continue to add new capabilities, new integration of our technologies, and new customer successes. Scott characterizes the partnership as a great foundation in value, that continues to grow. He compliments “the latest moves by Syncsort, that allow joint customers to get much more benefit than they originally anticipated.”
Scott also notes that there is a lot of green field opportunity with hundreds of companies whose legacy data systems are doing a great job of running business facing applications, with data that is a valuable data lake source, but is currently “locked away.” Accessing that data and joining it with other data in the data lake is key. The value of partnership, he notes, is to create an easy bridge for that data to the data lake.
This enhances the value of all the other data being built there. He cites use case examples such as legacy data as reference data for consistency of response when joined with social and web data, or as an online archive and optimization of the overall data fabric. He also points out it can be used to offload some of the historical data which may not even be used in legacy systems, but can be placed in the data lake where it can actually be available for data analytics.
When asked if we will continue to expand the functional footprint of the joint solution, Tendu states that we are jointly focused on liberating data from EDW or legacy architectures, understanding the path that data took and publishing metadata into Atlas to support it as an open data governance solution. She notes that this is an area where we see an opportunity to grow and also strengthen joint solutions.
Scott adds that this is essentially extended “provenance,” which is a really big deal. 90% of the cost of building legacy systems is to create business rules and metadata, so it’s important to preserve that information. Tendu points out that the joint solution is already used in large production deployments, helping customers access all enter data including legacy and mainframe and new data sources in the data lake.
New Change Data Capture Capabilities Add Additional Value
Tendu discloses that Syncsort will be announcing new change data capture (CDC) capabilities next week. CDC from legacy data stores into Hadoop will keep data fresh and give customers more choices in terms of populating the data lake, including use cases like archiving data into the Cloud.
She explains that EDW Optimization focuses on business use cases, making all enterprise data accessible in the data lake. “If you are an insurance company managing claims or building a Hadoop-as-a-service architecture, there are multiple ways you can keep the data fresh in the data lake. As data volumes grow and real-time analytics requirements of the business are growing, our customers are looking for a way to capture changes in real-time.”
Partnering in the Lab and in the Field
Scott also touts that Syncsort and Hortonworks engineering teams have done very low-level integration of mutual technology so DMX-h can exploit core services like YARN for multi-tenancy and workload management, and Atlas for data governance. “As the Syncsort team adds functionality, that simply accretes to the benefit of what we’ve built together.”
He adds that the whole notion of governance and metadata management is a big deal. And now, regardless of what applications they chose, customers have a common, trusted infrastructure where all the information is tagged and stays with the data throughout its lifecycle.
Tendu points out that Atlas is an important integration point. The partnership is cross-functional with engineering, marketing and field teams working together to help customers build a modern data architecture. There is so much data from cloud, mobile and the web, and a lot of legacy data stores. We are targeting use cases on how to enable data stores that combine this data in the data lake.
Host George Gilbert, a Big Data & analytics analyst for Wikibon, is impressed by all the new developments and remarks on the combination of data integration, new CDC capabilities and data quality software, characterizing its ability to make data flows “much faster and high fidelity.” At the end of Scott and Tendu’s commentary, host Lisa Martin recognizes that the partnership is incredibly strong on the technology, strategic and GTM side.
Leveraging the Value of Syncsort’s Expanded Product Portfolio
Beyond the Hortonworks partnership, Tendu explains how the Trillium Software acquisition has been transformative for Syncsort – allowing the organization to deliver joint solutions from data integration and data quality & profiling portfolios. She shares that recent first steps have been focused on data governance use cases.
Tendu adds that data quality in the data lake is a big focus. She touches on this week’s Collibra partnership announcement, pointing out the importance of making business rules and technical metadata available thru dashboards for data scientists. Tendu explains that data goes thru our high-performance processing engine, and we leverage our unmatched capabilities to profile that data and get a better understanding of the data. Customers can create business rules to cleanse the data and preserve its fidelity. In addition, we can efficiently stream data thru Kafka and highly-efficient proprietary methods.
Read our eBook Bringing Big Data to Life: Overcoming Challenges of Legacy Data in Hadoop to understand the challenges associated with integrating mainframe data into Hadoop and how to solve them.