Data infrastructure optimization software
Data integration and quality software
Data availability and security software
Cloud solutions

Overcoming the Challenges of Getting Your Mainframe Data into Hadoop

No discussion of big data is complete without addressing mainframe data. Depending on whom you ask, about 60 to 80 percent of all the transactional data in the world is stored on mainframes. This transactional data is a gold mine of reference data that can be used to make sense of enterprise-wide data and drive your big data analytics, but getting it off the mainframe is, well, challenging. That is especially true if you need to get it off the mainframe, yet keep the mainframe data format. Here are the challenges associated with integrating mainframe data into Hadoop, while allowing organizations to work with mainframe data in Hadoop or Spark in its native format  ̶ essential for maintaining data lineage and compliance.

Most estimates put the amount of the world's transactional data stored on the mainframe around 70 percent. That's too much data to ignore when assembling your plans for big data.

Most estimates put the amount of the world’s transactional data stored on the mainframe around 70 percent. That’s too much data to ignore when assembling your plans for big data.

Due to their cost-effective scalability, Hadoop and Spark have taken hold in just about every large enterprise.  However, industries such as banking, insurance, and healthcare, haven’t been able to fully leverage these platforms because they have a lot of critical data on the mainframe, which can’t be altered due to regulatory mandates. However, Syncsort’s DMX-h software allows you to quickly access mainframe data unchanged. It can then be integrated with other enterprise data sources, without the need for specialized skills in either Hadoop or mainframe. By copying the data via DMX-h, you can preserve the data lineage for the purposes of governance while eliminating much of the latency often associated with these tasks. It just takes a few simple clicks to do.

Challenge: Addressing the Hadoop Connectivity Issues with the Mainframe

It’s been problematic to integrate mainframe data into Hadoop because there is no native connectivity and processing capabilities in Hadoop for mainframe data. Syncsort DMX-h solves this issue, allowing organizations to work with mainframe data in Hadoop or Spark in its native format  ̶  essential for maintaining data lineage and compliance. Furthermore, it offers support for FTPS and Connect:Direct. In fact, Syncsort says their customers are telling them that with the current release of DMX-h, they have delivered a solution that will allow them to do things that were previously impossible. They say that Syncsort both simplified and secured the process of accessing and integrating mainframe data with Big Data platforms, and now help organizations with governance when loading mainframe data into Hadoop.. Since Syncsort is a contributor to both Apache Sqoop and Apache Spark open source library for accessing the mainframe, DMX-h extends these connections in order to offer additional support for file type, data type, and COBOL Copybook.

So much data, so little time. That's another area where Syncsort DMX-h can help.

So much data, so little time. That’s another area where Syncsort DMX-h can help.

It can take a frustrating amount of time and effort to load database tables into Hadoop, primarily because developers must develop individual loads for each and every table. With DMX DataFunnel™ , you can easily ingest hundreds of DB2 tables into Hadoop, all in one single swoop. It also allows you to extract and migrate entire database schemas in a single invocation. One customer had this to say: “Syncsort’s DataFunnel utility has been a powerful tool in our Data Lake strategy. We were able to ingest into Hadoop over 800 tables from one source system … with one press of the button, all while leveraging our existing DMX-h install. Its configuration-based approach provides great flexibility from source to target. DataFunnel is a powerful data pump for our Data Lake”.

Challenge: There is Only a Limited Amount of Time to Access Mainframe Data

Access to mainframe data is limited to short periods of time in which users have to extract extremely large quantities of data. Attempting to translate and unpack the data in transit takes too much time. With Syncsort DMX-h, data can be copied from the mainframe to Hadoop, while keeping the mainframe formatting, very efficiently. After the data is in Hadoop, DMX-h is able to take advantage of the distributed resources of the clusters in order to access and integrate the data natively, without staging a translated copy.

Does Syncsort DMX-h sound like the perfect option for you? You can see this product and all of Syncsort’s Big Data solutions here.