Today, Syncsort announced our strategy and entry into the Hadoop community. This is really exciting, as our customers have told us they are pushing more and more into Hadoop as “big data” grows in their enterprise and the need to scale becomes even more critical for their businesses. Our developers are really excited about what Syncsort is doing, as well. Even though we are an East Coast based company, several of them are even threatening to dye their hair purple…Hadoop purple!
Our announcement has two major components to it. The first part is that we intend to contribute an external sort “plug-in” to the community. There have been calls in the past for performance enhancements and other optimizations to sort. With this contribution, anyone could seamlessly plug their own sort engine into Hadoop by using the published interfaces, including Syncsort’s solution (more details on that below). With Syncsort’s 40+ years of experience in sorting, we believe we have unique expertise we can apply for the benefit of the larger Hadoop community.
While other data integration vendors are talking about Hadoop, we have not seen any of them embrace the community by making contributions. We believe this distinguishes Syncsort’s entry into the community and hope that it is viewed as a sign of our sincerity and excitement around working with the open source community and customers to truly make Hadoop better and even more valuable than it is today.
The second part of our announcement is the new DMExpress Hadoop Edition. Entering a limited availability beta period in June, this new offering will encompass 3 components:
- HDFS connectivity: extract and load HDFS. We actually can do this today with examples we ship in the product. If you’re a DMExpress customer, check this out in the online help.
- The sort acceleration piece from our contribution (discussed above) to actually improve the sort performance. Our marketing team (who I think is also dabbling in the purple hair thing) is calling this Hadoop Acceleration. While we are contributing the plug-in, the actual sort from Syncsort will be this new DMExpress Hadoop Edition. As you can see from our announcement, we have seen some pretty good performance improvements. We will continue to benchmark our acceleration throughout the beta period. Stay tuned to this blog for more results.
- The ability to create MapReduce jobs in the DMExpress graphical environment, rather than write Java, Pig scripts, etc. If you know DMExpress, this is the Task Editor. If you need to write data transformation, re-formatting of data, aggregations, etc., the user can now use our Task Editor. DMExpress will automatically deploy on the Hadoop cluster sourcing the HDFS, and running the transformations across the cluster. Not only is the processing faster, the jobs are much easier to write and maintain.
This is obviously just the beginning. I am very excited about our announcement today and our entry into the Hadoop community. We have received overwhelmingly positive feedback from our customers and the industry analysts we have briefed. Stay tuned for more details and results from our beta testing. I even promise to post pictures of any Syncsort developers or marketing folks that actually follow through with the purple hair!