This post originally appeared on the Hortonworks blog as a guest blog by Syncsort’s Tendü Yoğurtçu, who provides insight on how EDW optimization can help drive success of a modern enterprise data warehouse solution.
Syncsort and Hortonworks working together to drive the success of a modern EDW solution
Enterprise Data Warehouse has become a standard component of the corporate data architecture. In the past 15 years, a variety of product offerings were introduced into the market on building EDWs, operational data stores, real-time Data Warehouses. The differences is the speed and performance of these different EDW offerings. And most recently, the convergence of structured and unstructured data.
In today’s guest blog, we have invited Tendü Yoğurtçu, General Manager for Big Data to discuss how Syncsort can help drive the success of a modern EDW solution.
During last week’s Gartner Data and Analytics Summit, the keynote focused on the need for organizations to transform scarcity of trust, value, insights, and skills into abundance in order to become a data centric organization and create business value. Our partnership with Hortonworks and joint solution offering for EDW Optimization empowers our customers to do just that, by making it easy to onboard all their critical enterprise data to HDP for actionable insights they can trust, using the skills they already have for the fastest time to value.
Businesses are striving to get the most value out of their data and turn it into actionable insights. The shift towards becoming a data-centric organization requires a modern data architecture with the ability to access all critical enterprise data at the right time. This seemingly simple proposition actually can be very complex in practice. Most organizations find themselves challenged by the diversity of data sources and types, the rapidly evolving Big Data technology stack, and the resources and time required to deliver ROI.
Today’s data engineers, data warehouse engineers and data architects face duel challenges. They need to understand the new data sources from mobile, connected devices, web traffic, IoT, etc., but also must integrate this data with the existing, and often legacy, critical data assets. This requires the right set of skills and tools that are able to work with a variety of data formats, sources and compute platforms. To make things more complicated, the Big Data technology stack is constantly evolving and the skill gaps slow down the projects and time to value.
Tools that simplify access to, and integration of, all data, and continuously adapt to the latest technologies, help to close these gaps and deliver faster time to value.
We at Syncsort are excited about the joint solution offering with Hortonworks to augment the enterprise data warehouse and extend the life of customers’ current investments by offloading ETL processing from the EDW to Hadoop and enriching the EDW data with the new data sources in Hortonworks Data Platform (HDP).
Syncsort DMX-h enables organizations to liberate data from all sources across the enterprise, and access that data to create actionable insights with the latest compute frameworks, including Spark 2.0. Our single software environment and easy-to-use interface provide an “easy button” to create and populate the data lake with data from virtually any source – from legacy systems to Kafka, JSON and more, to get you up and running fast, on-premise or in the cloud. Syncsort DMX-h runs natively in the Hadoop cluster and comes with native connectors to RDBMs and all mainframe data sources. These complex legacy data sets traditionally require very specialized skills sets; so, the ability of DMX-h to speak to these legacy data stores natively, makes data access significantly easier.
When accessing data from an EDW and integrating into a data lake, it’s important to understand the data and to leverage the existing schemas. Once again, DMX-h greatly simplifies the process by automatically creating and mapping the metadata from the enterprise data warehouse to the target data formats in the Hadoop data lake. Data engineers simply point to the EDW and populate the data lake in a single step, securely and efficiently.
One of our mutual customers illustrates how the joint solution delivers significant benefits. This organization has thousands of tables in DB2/z, Oracle and data in VSAM files. All of this data need to be accessed and made available for advanced analytics in the HDP and also need to be refreshed. They use Syncsort DMX-h to automatically map the schemas from DB2/z, and Oracle, and metadata from copybooks to create the Hive tables, and populate the data in Hadoop. Instead of spending weeks to understand the source metadata, create the corresponding Hive tables, map the data formats and populate the data lake, they can do this in a couple of hours with DMX-h. The time to value and business agility they gain with our joint solution have a direct positive impact on their business.
Legacy data in Hadoop causing unwanted roadblocks? Don’t miss opportunities to maximize the breadth of your data lake – Download our latest eBook, Bringing Big Data to Life: Overcoming Challenges of Legacy Data in Hadoop, to learn trending insights on integrating mainframe data into Hadoop.