acceleration

Managing unprecedented data growth while keeping up with performance requirements constitutes one of the major challenges organizations face in the era of ‘Big Data.’ On their quest to leverage information to properly support business objectives, IT departments have historically turned to inefficient workarounds that create significant complexity and escalating costs. In fact, according to a recent research report by the analyst firm Enterprise Strategy Group, data integration complexity was the number one data analytics challenge, cited by over 270 respondent organizations.

That’s why increasingly more organizations are embarking on new initiatives to address a fundamental problem: accelerating performance while reducing the cost and complexity of their data integration environments. ETL 2.0, a new approach Syncsort recently started advocating, is our answer to this challenge.  Today, we are pleased to announce DMExpress 7.0, the foundation that we believe will start to make ETL 2.0 a reality for thousands of organizations.

DMExpress 7.0 incorporates key enhancements that further extend our leadership as the only ETL vendor that can provide a fast, efficient, simple, cost-effective approach to data integration. More importantly, the 7.0 release allows organizations to leverage more data than ever before while maintaining the flexibility, ease-of-use, and cost efficiencies required by the business.  The release includes significant enhancements in three key areas:

Scalable, self-tuning ETL Engine. This is the core of our software and a clear differentiator for Syncsort. Having a scalable, self-tuning ETL engine enables organizations to process data transformations in memory within the ETL engine, eliminating the need for staging and freeing up database capacity.  DMExpress 7.0 includes significant performance enhancements and I/O optimizations that deliver even faster performance at scale.

Enterprise Connectivity & Acceleration. DMExpress provides the ability to leverage virtually all sources of data as well as accelerate previous investments in data integration environments. DMExpress 7.0 extends Syncsort’s leadership on this area by delivering:

  • Tighter integration with Hadoop, including Load/Extract capabilities
  • Faster, easier data integration acceleration through bulk metadata import capabilities
  • Improved support for mainframe application modernization projects with expanded JCL SORT  control statement support, and out-of-the-box integration with leading re-hosting solutions

Simple, Agile ETL Development. We believe it should not take you days to install and configure data integration tools. In a matter of days, you should be able to install, develop and deploy your designs into production. That’s why DMExpress 7.0 extends the software’s ease of use, providing features to encourage reusability and collaboration to increase overall productivity.

A couple of months ago at TDWI World in San Diego, Syncsort defined a new approach to ETL, one that would deliver on the long overdue promises of data integration tools. We called this approach ETL 2.0.  Today from TDWI World in Orlando, we announce DMExpress 7.0, the first solution in the market to effectively deliver a completely new approach to data integration, the foundation of ETL 2.0.

DMExpress 7.0 will be generally available in December. In the meantime, stay tuned for more details on our upcoming DMExpress 7.0 release, straight from our R&D labs!

{ 0 comments }

Today, Syncsort announced our strategy and entry into the Hadoop community.  This is really exciting, as our customers have told us they are pushing more and more into Hadoop as “big data” grows in their enterprise and the need to scale becomes even more critical for their businesses.  Our developers are really excited about what Syncsort is doing, as well.  Even though we are an East Coast based company, several of them are even threatening to dye their hair purple…Hadoop purple!

Our announcement has two major components to it.  The first part is that we intend to contribute an external sort “plug-in” to the community.  There have been calls in the past for performance enhancements and other optimizations to sort.  With this contribution, anyone could seamlessly plug their own sort engine into Hadoop by using the published interfaces, including Syncsort’s solution (more details on that below).  With Syncsort’s 40+ years of experience in sorting, we believe we have unique expertise we can apply for the benefit of the larger Hadoop community.

While other data integration vendors are talking about Hadoop, we have not seen any of them embrace the community by making contributions. We believe this distinguishes Syncsort’s entry into the community and hope that it is viewed as a sign of our sincerity and excitement around working with the open source community and customers to truly make Hadoop better and even more valuable than it is today.

 The second part of our announcement is the new DMExpress Hadoop Edition.  Entering a limited availability beta period in June, this new offering will encompass 3 components: 

  1. HDFS connectivity: extract and load HDFS.  We actually can do this today with examples we ship in the product.  If you’re a DMExpress customer, check this out in the online help.
  2. The sort acceleration piece from our contribution (discussed above) to actually improve the sort performance.  Our marketing team (who I think is also dabbling in the purple hair thing) is calling this Hadoop Acceleration.  While we are contributing the plug-in, the actual sort from Syncsort will be this new DMExpress Hadoop Edition.  As you can see from our announcement, we have seen some pretty good performance improvements.  We will continue to benchmark our acceleration throughout the beta period.  Stay tuned to this blog for more results.
  3. The ability to create MapReduce jobs in the DMExpress graphical environment, rather than write Java, Pig scripts, etc.  If you know DMExpress, this is the Task Editor.  If you need to write data transformation, re-formatting of data, aggregations, etc., the user can now use our Task Editor.  DMExpress will automatically deploy on the Hadoop cluster sourcing the HDFS, and running the transformations across the cluster.  Not only is the processing faster, the jobs are much easier to write and maintain.

This is obviously just the beginning.  I am very excited about our announcement today and our entry into the Hadoop community.  We have received overwhelmingly positive feedback from our customers and the industry analysts we have briefed.  Stay tuned for more details and results from our beta testing.  I even promise to post pictures of any Syncsort developers or marketing folks that actually follow through with the purple hair!

{ 0 comments }

As I leave “Vegas for kids” (aka Orlando), I realize that I have seen the light.  That light is metadata or, more specifically, metadata interchange. Metadata interchange is critical for our customers and for Syncsort. As the well-known television commercial says, “this changes everything.”

In my previous post on DMExpress 6.5, I talked about this concept of “snap”.  I want to add some more detail about this here.

The first place to start a DI Acceleration project, or any DI/ETL project for that matter, is to understand the source systems being read and the target system(s) to write.  The metadata Interchange will allow users to import the source and target schemas from DI/ETL tools (Informatica PowerCenter, DataStage, etc.), databases, files, COBOL Copybooks (really ?!?), etc.  This will allow users to match the metadata in a DI Acceleration scenario.  In DMExpress, this corresponds to individual sources and targets (file, pipe, database table, table in memory, etc.) and their associated metadata (delimited field layout, fixed position field layouts, XML schemas, extracted database columns, COBOL copybooks, etc.).  Pretty cool so far!

Enter transformations (fanfare please).

To “jump start” a DI Acceleration project, metadata Interchange will provide the import of transformations performed in an ETL tool into DMExpress.  This will assist in the development of the DI/ETL into DMExpress.  Metadata Interchange will import the transformations by constructing DMExpress Source/Sort/Copy/Aggregate/Join/etc. task(s) and then connect these tasks in a DMExpress job to perform the equivalent processing. Will we get it 100%?  No.  As the saying goes, only death and taxes are a 100% certainty.  However, when you watch this working in our labs…it’s amazing!

Probably the most important piece of metadata interchange is to maintain the data lineage.  Whether users are using DMExpress as a performance augmentation to other DI/ETL tools, or using DMExpress for all ETL, users do not and cannot lose the data lineage for reporting and compliance purposes.  Metadata interchange will provide this data lineage to understand and maintain the data origination, how it was transformed, and allows export of it via the bridge.

Syncsort is using the same “Switzerland” of metadata vendors that almost every other data vendor (DI, BI, modeling, etc.) uses in the industry. As one of the top DI analysts said to us recently, this is “a really big deal” for Syncsort and we believe it makes DI acceleration that much more compelling and credible for the market.

{ 0 comments }