Overcoming Three Major Challenges of Moving Data from the Mainframe to Hadoop
When it comes to digging value out of the wealth of data many companies have at their disposal, Hadoop is the platform of choice. It is built on a number of capable, sophisticated, and well-supported open source tools that are specifically designed to support big data analytics. But while Hadoop’s popularity continues to climb, there is one glaring gap in its capabilities—it doesn’t provide native support for mainframes. Importing mainframe data into a Hadoop environment and processing it to extract value can be difficult, time-consuming, and costly.
Syncsort DMX-h is specifically designed to bridge the gap between the mainframe and Hadoop. It drastically simplifies the process of transferring data from mainframes to Hadoop clusters, overcoming several difficult challenges in the process. Let’s take a look at some of those challenges, and how DMX-h addresses them.
Challenges with Data
It might seem that transferring data from a mainframe into a Hadoop data lake should be a simple process of uploading it via FTP or Connect:Direct. But it’s a lot more complicated than that. Mainframe datasets exist in many different forms and formats, including VSAM files, fixed and variable length files, Db2 and IMS databases, and COBOL Copybooks. They use EBCDIC or packed decimal encoding rather than ASCII, and may be compressed.
COBOL Copybooks are a particular problem. They are metadata blocks that define the physical layout of data, but which are stored separately from that data. They can be quite complex, containing not just formatting information, but also logic in the form, for example, of nested Occurs Depending On clauses. Hadoop knows nothing of Copybooks, but without that knowledge, there’s no way for it to understand the structure of mainframe data.
Because of all these unfamiliar structural variations, feeding mainframe data in its native form into a Hadoop cluster would cause a bad case of digital indigestion!
Syncsort DMX-h is specifically designed to handle these data formatting variations. It supports all mainframe data types and formats, and can ingest such data into a Hadoop cluster, and process it there, without changing its format. In many cases, that ability to preserve the original form of the data is necessary to comply with governance, compliance and auditing mandates.
Challenges with Security
In today’s world, maintaining the highest level of data security, whether the data is in transit or at rest, is an absolute necessity. Mainframes are noted for their extremely high levels of data security, but what happens when that data is exported to Hadoop?
Syncsort DMX-h protects data security on both the mainframe and Hadoop ends in several ways. First, because it does not require the installation of any additional applications on the mainframe, there’s no chance of it compromising security at that point. Then, through its support for FTPS, Connect:Direct, and encrypted datasets, DMX-h strongly protects data during the transfer process. Finally, with its native support of Kerberos and LDAP, and its close integration with Apache Sentry, DMX-h maintains that high level of security when the data is stored and processed within the Hadoop cluster.
Challenges with Staff Skills
Mainframers who also have Hadoop skills, or Hadoop mavens who also understand mainframes, are extremely rare creatures. In other words, the chances of finding IT staff members who can handle the technical complexities of the entire mainframe to Hadoop process are pretty low. That’s why Syncsort DMX-h is designed for a high degree of automation in the process of integrating mainframe data into a Hadoop environment.
DMX-h sports a simple, user-friendly, highly intuitive GUI (graphical user interface) that allows application developers, who may have little or no mainframe experience, to simply point and click to build and maintain any desired data integration workflow. DMX-h handles complex Hadoop tasks, such as creating mappers and reducers for MapReduce, entirely on its own.
Syncsort DMX-h Helps Companies Get the Most Out of their Mainframe Data
The potential value of the data stored on corporate mainframes is immense. But until recently the fact that the best data analytics tools, such as Hadoop, were not available for the mainframe made fully leveraging that data difficult and expensive. But that’s no longer the case. Syncsort DMX-h makes adding the richness of mainframe-based data to Hadoop data lakes not only practical but also relatively easy.