Offloading Mainframe to Hadoop: How Much Can You Save?
There are many reasons to migrate to the Hadoop architecture for managing big data. Some choose Hadoop for the availability of source code. Some are attracted by the large and growing community of software engineers and system admins who are competent with the Hadoop platform. Some like the convenience of swapping out inexpensive components on the fly instead of shutting down a big monolithic system whenever maintenance or repairs are needed.
But the bottom line, for many Hadoop sites, is the bottom line. More and more companies are choosing Hadoop because it saves money compared to traditional mainframe approaches.
Where the Savings Come In
The open-source tools, bustling community of expert workers, and commodity hardware associated with Hadoop already reduce costs for data management and processing. But the real big money involves offloading data and processing from legacy mainframe systems.
Until recently, mainframes were the platform of choice for storing and processing big data. That’s what they were invented to do, after all.
But mainframes are optimized for data processing and analysis. That’s an important job and it justifies the high cost of purchasing, maintaining, and operating big iron. It’s wasteful, however, to use mainframes for data storage when inexpensive commodity-based storage alternatives are available. And it’s criminal to use the enterprise’s most expensive computing resource to prepare huge amounts of data and transform the data’s formats for use in computations. The data preparation and transformation operations can consume expensive hours of mainframe computing time, and it is a relatively straightforward process that in no way requires the mainframe’s advanced processing capabilities. It’s a waste. And in many enterprises, it accounts for as much as half of the mainframe’s cycles.
A Hybrid Architecture
Hadoop-based solutions have been proposed as a replacement for mainframes, but many enterprises are reluctant to toss out big-iron systems that have served them faithfully for years. That reluctance has given rise to a new hybrid architecture in which inexpensive, commodity-based Hadoop systems are used in tandem with mainframes. More and more system architects are using arrays of cheap commodity hardware to load and transform data for consumption by the mainframe, and reserving the mainframe’s computational power for bulk data crunching.
How much can you save? Industry experts say that a mainframe-based data warehouse can cost $100,000 or even $200,000 per terabyte. A Hadoop-based system, in contrast, costs just $400 to $1,000 per terabyte. That’s a tremendous savings. Plus, the mainframe will continue to run the analytic software that it always ran — no extensive retooling is required.
Extending the Mainframe’s Capacity
And there is some comfort in this approach too, as IT managers can continue to use the mainframes that have served them well. In fact, offloading data collection and processing tasks to Hadoop can extend the life of the mainframe and effectively multiply its capacity. The mainframe can handle a larger data set if it’s not called upon to do all the loading and prepossessing, so enterprises save money that they would otherwise spend upgrading their legacy systems.
Adding Hadoop to the system architecture can result in significant savings.
And that’s just cost avoidance. The Hadoop-plus-mainframe hybrid architecture can also lead to better business decision-making. Because when the store-and-transform operations are offloaded to Hadoop-based server clusters, the mainframe can handle five years of historical data instead of two. Fifty states instead of 10. Worldwide sales data instead of regional summaries. Those are the kinds of improvements that empower management to make better decisions and grow their businesses.
Rearchitecting Information Services
The 2013 Hadoop Summit in San Jose, California included a presentation by Sunilkuma Kakade, IT director of big data consulting firm MetaScale and Aashish Chandra, divisional vice president of MetaScale’s parent company, Sears Holdings. Kakade and Chandra outlined their vision of an effective but incremental restructuring the traditional IT department. Keep the mainframe for its unique processing power, the two advised, but remove it from the center of the company’s data-processing architecture. Instead, use a cluster of inexpensive Hadoop servers as the enterprise data hub.
In response to concerns about performance, Kakade and Chandra showed benchmarks demonstrating that moving a data-sorting task from the mainframe to Hadoop resulted in a time savings of 95.6 percent. Sorting took 45 minutes on the mainframe using JCL, while an equivalent routine written for Hadoop executed in less than two minutes on commodity-priced hardware.
Part of this gain comes from Syncsort’s contributions to the Hadoop core. Because of Syncsort’s contributions, Hadoop isn’t just less expensive, it can be much faster.
The Bottom Line
Offloading data storage and reformatting tasks to Hadoop can result in real cost savings in IT. In addition to direct savings, it can extend the life of existing mainframe resources. And it can contribute to better decision-making by extending the mainframe’s capacity.
More than 50 percent of mainframes currently run Syncsort software, which offers a seamless migration path to Hadoop, so the cost benefits are easy to achieve. That’s just one more reason that Hadoop in the enterprise is a solution whose time has come.