Hadoop, Data Lakes and Building a Next Gen Big Data Architecture
This blog was originally posted as a guest blog on the Hortonworks blog site.
This week, Hortonworks announced an exciting expansion of our long-standing partnership. Hortonworks will now resell Syncsort’s leading Hadoop data integration software, DMX-h for onboarding ETL processing in Hadoop. DMX-h will enable our joint customers to easily access and collect data from a diverse set of enterprise data sources, including RDBMSs, mainframe and emerging streaming data sources, bringing all of the data to the Hortonworks Data Platform (HDP™) for bigger insights and to drive business agility.
Our expanded partnership with Hortonworks is primarily targeting the challenges around big data integration, and onboarding the existing skills in the organization quickly with Hadoop.
This is the first time that Hortonworks, a leading innovator in open and connected platforms, has chosen to resell commercial partner software. We are excited that Hortonworks has chosen DMX-h and about the added value it can bring to Hortonworks customers.
Why did Hortonworks choose Syncsort’s data integration solution? It’s because of Syncsort’s unique value proposition, enabling organizations to access and integrate enterprise-wide data and bring all of the data to the Hadoop data lake. The key differentiators from any other data integration solutions include:
- Easy and lightweight ETL deployment on-premise and in the cloud, and quick onboarding with Hadoop
- Simple, secure and efficient approach to building the Hadoop data lake by accessing all enterprise data sources, including mainframe
- Strong commitment and track record of contributing to the Apache Hadoop and Apache Spark open source projects yielding scalability, interoperability and future-proofing benefits with DMX-h’s native integration with Hadoop
- Trustworthy global presence in 87% of enterprise Fortune 500 companies
Due to Syncsort’s ongoing contributions to Apache Hadoop projects, DMX-h is natively integrated into the Hadoop data flow, providing interoperability and scalability. DMX-h is deployed via Apache Ambari, out of the box integrated with security frameworks including Kerberos, Apache Ranger and Knox, and is integrated with HCatalog, providing data governance across platforms. Syncsort DMX-h’s ‘design once, deploy anywhere’ native architecture helped many organizations to seamlessly run their applications when migrating from MapReduce v1 to YARN, and guarantees the same future proofing for emerging compute platforms, such as Apache Spark.
As Scott Gnau’s recent blog post outlined, future proofing is critical to dealing with rapid change to the technology stack and a platform that can connect existing enterprise data whether from legacy mainframe or RDBMSs with the new streaming data systems will be key to the success of the big data initiatives. To ensure solid ROI, organizations must include all critical sources of enterprise data – including those that have been traditionally handled in a silo, such as mainframe, leveraging a cost-effective and scalable platform – such as HDP, and existing skill sets.
Syncsort and Hortonworks are committed to help enterprises overcome challenges in transitioning to the next generation data architecture. Our customers are already taking advantage of our joint offering. One of our Fortune 500 customers is using DMX-h on HDP to do customer churn analytics, improving customer service and driving operational efficiencies. They populate HDP with data from RDBMSs including Oracle, SQL Server, and DB2 on the mainframe. Syncsort automates metadata mapping from these data sources to Hive tables, ensuring secure access to data and populating the data lake. DMX-h is used for synchronizing the data on the cluster and also for preparing the data for advanced analytics.
This single software environment provides a data pipeline that can be used for both batch and streaming data, insulating the applications from the underlying compute frameworks, whether Hadoop MapReduce or Spark, on premise or in the cloud. Benefits to our customers are the rapid development of the application using existing skills, agility in integrating additional data sources such as streaming data patterns, and portability for future cloud deployment. All of these advantages are delivered securely and in keeping with regulatory compliance requirements, enabling organizations to bring new services and products to market quickly with increased return on investment.
Together, HDP and Syncsort DMX-h offer organizations a trusted solution for integrating ETL work flows with connected data platforms. At the Hadoop Summit in Dublin, Hortonworks and Syncsort were together at the Hortonworks booth to provide attendees with more information on the joint solution. Be sure to visit us at the Hadoop Summit in San Jose in June!