In case you missed Part I and Part II of our blog series, “Simply the Best,” I’m Paige Roberts, the new Product Manager for big data. Our last two posts looked at why customers are increasingly moving data from the mainframe to Hadoop, along with the many benefits of using DMX-h to do so.
Today, Arnie Farrelly, the VP of Global Support and Services, will share insights into real customer use cases.
So, can you tell me about any specific customer who is using this capability?
It’s brand new, just introduced in the latest version, but we have several customers who have really been asking for it.
What’s a good example of an in-production DMX-h mainframe to Hadoop customer?
Well, there are quite a few. One large insurance industry customer needed to populate their data lake very rapidly. Their challenge was that they had hundreds of tables in DB2 on the mainframe, hundreds of Oracle tables, and mainframe VSAM files as well. They needed a way to get all that data into the data lake quickly. Even with a point and click interface, designing 600 jobs that simply took one table, moved it onto Hadoop and wrote it out would take too long. They’d looked at Informatica and other solutions, but they weren’t able to find anything that would actually work. They moved their VSAM files into Hadoop with DMX-h, no problem. For the databases, we just recently developed a utility called DataFunnel that can ingest a lot of tables all at once. You can take an entire database schema, hundreds of tables, and load them into HDFS very quickly. They were able to load, I think, 1.4 terabytes of Oracle data, and over 600 DB2 tables. That was huge for them. They thought that was the greatest thing since sliced bread.
DataFunnel works on any kind of database, not just mainframe DB2?
Any source or target that our DMX-h engine supports, DataFunnel supports or can support, and that covers a lot of ground.
It just grabs hundreds of tables at a time?
Exactly. It runs in parallel across all the different data nodes so you can ingest data into the cluster in parallel.
That’s impressive. I could have used one of those a few years back. Do you have another example mainframe to Hadoop customer?
Well, a large US bank that we can’t name right now, was one of our early DMX-h adopters. They were using something called JRecord, an open source utility that lets you get mainframe data into Hadoop. It requires a fair amount of programming – a programmer’s kind of tool. The problem for the bank was they had incredibly complex copybooks, like 83-page copybooks, with lots of redefines in them. They used us to take that complex data with all the redefines, pull it in, transform it and load it into HDFS – and did so very easily. The POC was completed in a day. We went right in and showed them we could parse that complex copybook, which was no small feat.
Just right out of the box, you could parse that with DMX-h?
Within a day.
JRecord couldn’t do that?
Well, maybe they could, but it would have taken a fair amount of work and expensive developer time. That’s the thing we’re hearing a lot. A lot of the challenges people are facing with Hadoop are doable. They’re solvable. They’re just really complicated and difficult, and require a whole lot of time-consuming work to get them done. Companies don’t want to have to go out and hire more people, train them on Hadoop and wait forever. Our product fits in nicely, saving money, time and aggravation by simplifying the process.
Learn why mainframe data is an essential part of your data hub and how DMX-h can help break through the common barriers for data access.
So, about this theme I’ve been hearing around mainframe to Hadoop, “Simply the Best.” Why do you feel Syncsort is simply the best for this use case? What does it bring to the table that’s so special?
A few things:
First, Syncsort was processing big data before it was a buzz word. We know big data. Back when sorting two gigs of data was a huge, insurmountable task, we invented our high performance sort product that could process way more than two gigs of data very efficiently. Our product blew away the market with the performance and efficiency. So, we get the big data problem of trying to process more data than standard software is designed to handle. We know big data.
Second, we know mainframes. Our history on mainframes is decades long, and half our business is still building and selling mainframe software. We have IronStream, a Kafka to Splunk product for exporting streaming mainframe logs that sells like hotcakes. Our understanding of mainframes is unmatched in the market.
And finally, we know Hadoop. We’ve got a team of Hadoop experts. They know the best practices. We are in the top 10 contributors to core Hadoop, and have contributed to other projects like Spark, Sqoop and Parquet. We have been partnered tightly with Cloudera for years, and we are partnered, certified and have customers deployed on all the major distributions; Hortonworks, MapR, IBM, and Pivotal. We are committed to and immersed in Hadoop.
Put those three things together and we have a perfect storm. There really isn’t anyone else out there who can touch us for any project with both a mainframe and Hadoop involved. If your organization has a mainframe running key systems, and is moving to a Hadoop implementation, you would really be silly NOT to use Syncsort.
It’s a triple-threat!
(laughing) Yeah, we’re a triple-threat. We’re also very agile as a company, and the skills and knowledge on this team are incredible. We know Hadoop, we know mainframes, we know our products, and we know the space. In a lot of cases, if you need a feature we don’t have, we can extend the product within a week or two. That’s definitely not true of some other vendors in the market. They’ll tell you, “Oh, that will be on the roadmap. Look for it in a year or two, maybe.” We’re able to move on customer needs, and really deliver a solution that works. I know it may almost sound ridiculous, but a solution that works is sometimes a rare thing in this area. We hear this a lot from customers: “I tried other solutions, and they’re just not working.” We go in, show them a demo, and they’re already convinced. Because our product is straightforward to use, you can see how it works and what it does immediately.
At the end of the day, customers are not looking for a complex solution. We make complex problems very simple. That’s one thing we do exceptionally well as a company.
That’s why we’re Simply the Best.
Check out our video here to see DMX-h in action, accessing and loading Mainframe data into Hadoop.