Mainframe Real-time and Streaming Analytics

Big data analytics are like cookies. They’re much better when they’re fresh.


What I mean is this: If you analyze data as soon as it is created – in other words, if you perform real-time analytics – you get much more value out of it than if you wait until the data is a bit older. Sure, data analytics are still useful if the data is a day or a week old – just as cookies still taste pretty good the day after they’re baked. But nothing beats the taste of soft, gooey cookies straight out of the oven.

If you want to perform big data analytics in real time, it’s essential to run operations directly on the mainframe systems where the data is generated, or stream it in real time to other platforms for processing. This article explains why and how to perform real-time data analytics on mainframes.

The value of real-time analytics on mainframe data


Before delving into the how of streaming analytics on mainframes, let’s spend a few minutes explaining why being able to run analytics in real time on mainframe systems is so useful.

The short explanation is this: Mainframes tend to serve as hubs for some of the most important and rapidly changing data used by enterprises. Those enterprises can make much better use of that data if they can process it in real time.

After all, consider the types of data that are typically produced and stored by mainframe systems. Credit card transactions and other banking information are one obvious example. Data on passengers and flight status for airlines is another. Databases used by government agencies are a third. These are all mission-critical types of data, which is updated constantly.

If you don’t process mainframe data sets like the ones described above in real time, you risk them becoming stale and outdated before you get any results. Analyzing flight status across an airline’s network won’t be very useful if you’re working with data that is even just a few hours old, because, ultimately, the network will have changed significantly by the time your analytics results are in.

Similarly, consider what is at stake when you use data analytics to detect credit card transactions that may be fraudulent by identifying unusual patterns in payment data sets. The ability to find and block illegitimate transactions in real time is much better than having to wait until a transaction has already cleared, and the thieves have made off with the goods before you’re able to act.

The challenge of mainframe streaming analytics

The fact that mainframe data tends to be large and to change constantly creates a challenge for real-time analytics on mainframe systems. If you have to offload a large amount of rapidly changing data to an analytics platform before you can analyze it, the offloaded data is likely to be out of date by the time you can run analytics operations.

Add to that the problem of mainframe data being formatted in ways that popular analytics platforms such as Hadoop do not support directly, and you face a steep barrier to real-time analytics on mainframe systems.

Mainframe-friendly streaming data analytics

That, at least, has traditionally been the case. But with the real-time data analytics technologies available today, bringing streaming analytics to mainframes is now perfectly possible.

Modern real-time streaming platforms include solutions like Apache Kudu, a fast analytics engine for Hadoop; and Spark Streaming, a framework for writing applications using streaming data. These are great tools for running and acting on data analytics in real time.

If you’re working with data on a mainframe, however, these tools get you only part of the way to real-time analytics. Since Kudu, Hadoop and Spark were not designed with mainframes in mind, you need to find a way to feed your mainframe data into them, in a format they can support. And that all has to happen in real time, of course, if you want real-time analytics.

Bad approaches to mainframe analytics

There are different strategies that would theoretically allow you to manually analyze mainframe data in real time using one of the platforms described above. You could try to use log replication, so that your mainframe data is copied automatically to somewhere that Hadoop can work with it. Or you could attempt to store your mainframe data in a format that works with Hadoop and analyze it in place.

Those would be unreliable, complicated solutions, however. Do-it-yourself log replication can get messy because of formatting differences between mainframe data and the data formats that Hadoop supports. Analyzing in place usually requires an overhaul of your mainframe storage system, which is unfeasible for organizations with entrenched mainframe infrastructure that they cannot change easily.

Syncsort squares the mainframe analytics circle


A better approach is to take advantage of software that automatically streams mainframe data to Hadoop or Spark in order to process it in real time. That’s exactly what Syncsort does, through their Big Iron to Big Data solutions like DMX-h, which provides a single interface for accessing and integrating all your enterprise data sources — batch and streaming — across Hadoop, Spark, Linux, Unix or Windows — on premise or in the cloud, and Ironstream, which allows you to analyze mainframe data via familiar tools like Splunk.

In other words, Syncsort lets you square the circle. You can have your mainframe data and analyze it in real time, too. Gone are the days of having to wait for your mainframe data to offload into an analytics platform, or a clunky workflow that relies on hacks to try to allow to analyze mainframe data in real time, with imperfect results.

Christopher Tozzi

Authored by Christopher Tozzi

Christopher Tozzi has written about emerging technologies for a decade. His latest book, For Fun and Profit: A History of the Free and Open Source Software Revolution, is forthcoming with MIT Press in July 2017.

Leave a Comment