acceleration

For several years now I’ve been lucky enough to attend and present at MicroStrategy’s biggest user conference, MicroStrategy World, where thousands of professionals from many industries and places around the world meet together for a week to talk about the latest trends in technology, and more specifically, business intelligence. (In the spirit of full disclosure, I’m a former MicroStrategist).

This year, the meeting place was The Wynn in Las Vegas. As always, it was great to catch up with friends, colleagues, customers and partners. Moreover, it was equally exciting to see how business intelligence continues to reinvent itself. This year it looks more energized than ever thanks to advancements in mobile, social, cloud and Hadoop.

I’m sure there will be countless blogs and commentary about conference happenings. Therefore, I want to provide what I hope are different and valuable “takeaways” from the conference ─ a view through the data integration looking glass. So here we go:

1. Mobile and Social Intelligence are key drivers for Big Data. Mobile and social media are creating unprecedented amounts of information. Every Facebook check-in, every like, every comment on social media, provides valuable information about consumer preferences, sentiment, habits, networks, etc. Organizations who can leverage this data will definitely have an edge over the competition.

2. Transforming data is the key to the fourth “V” in Big Data. Transformations – the “T” in ETL – are still one of the most critical challenges organizations face today as they try to leverage Big Data. With increasing volume, velocity and variety of data, what will become even more important is finding ways to capitalize on the elusive “V” ─ value. Similarly, organizations capable of transforming more data in less time, with fewer resources, will be able to answer more “big questions” to provide better products and services to their customers.

3. The elephant in the room is Hadoop. Hadoop has quickly emerged as the framework of choice for Big Data processing and analytics. As such, it is playing a key role in making data processing affordable and disrupting the status quo. During his presentation at “World” Amr Awadallah, CTO and co-founder of Cloudera (a Syncsort partner) talked about how companies are offloading the “T” from expensive proprietary databases to Hadoop. Such a move can shift the economies of scale from as much as $100K/TB to as little as $1K/TB. As organizations implement Hadoop initiatives as a means to scale and reduce costs, they will need technologies to help them unlock Hadoop’s potential.

4. Don’t blame the messenger. In many cases, BI performance and data freshness are a data integration problem. Unfortunately, users often blame the tool that presents the information, in this case, the BI tool. However, more people are starting to realize that behind every successful BI or data warehousing project, there’s a strong ETL foundation. This is especially important when it comes to keeping BI data fresh. Therefore, it is critical to build a high-performance, scalable ETL environment that can seamlessly grow to suit the future needs of your organization.

5. Big Data requires new approaches. During a presentation by Netflix, one attendee asked why Netflix wasn’t using an enterprise data warehouse for Big Data (one that is known to be very scalable but also expensive) The answer from the presenter was simple, but deadly. The so-called data warehouse would never be able to reach the levels of scalability Netflix required. My take? Not all organizations manage petabytes of data as Netflix, or comScore (another Syncsort customer) do. However, they can still benefit from Big Data architectures. As organizations evolve their data processing environments, it’s important to adopt smarter approaches to data integration. A smart approach is one that will scale with the requirements of the business and will deliver results for fewer dollars and with fewer resources. This is exactly what a leading healthcare organization (another Syncsort customer) presented at “World” this year. As they migrated from their legacy ETL tool, they gained faster performance, better standards and best practices, faster deployment times, and enhanced scalability for future growth.

2013 is indeed looking like the year of Big Data, and MicroStrategy World provided more proof of how organizations are quickly embracing the “new normal.”

{ 0 comments }

Managing unprecedented data growth while keeping up with performance requirements constitutes one of the major challenges organizations face in the era of ‘Big Data.’ On their quest to leverage information to properly support business objectives, IT departments have historically turned to inefficient workarounds that create significant complexity and escalating costs. In fact, according to a recent research report by the analyst firm Enterprise Strategy Group, data integration complexity was the number one data analytics challenge, cited by over 270 respondent organizations.

That’s why increasingly more organizations are embarking on new initiatives to address a fundamental problem: accelerating performance while reducing the cost and complexity of their data integration environments. ETL 2.0, a new approach Syncsort recently started advocating, is our answer to this challenge.  Today, we are pleased to announce DMExpress 7.0, the foundation that we believe will start to make ETL 2.0 a reality for thousands of organizations.

DMExpress 7.0 incorporates key enhancements that further extend our leadership as the only ETL vendor that can provide a fast, efficient, simple, cost-effective approach to data integration. More importantly, the 7.0 release allows organizations to leverage more data than ever before while maintaining the flexibility, ease-of-use, and cost efficiencies required by the business.  The release includes significant enhancements in three key areas:

Scalable, self-tuning ETL Engine. This is the core of our software and a clear differentiator for Syncsort. Having a scalable, self-tuning ETL engine enables organizations to process data transformations in memory within the ETL engine, eliminating the need for staging and freeing up database capacity.  DMExpress 7.0 includes significant performance enhancements and I/O optimizations that deliver even faster performance at scale.

Enterprise Connectivity & Acceleration. DMExpress provides the ability to leverage virtually all sources of data as well as accelerate previous investments in data integration environments. DMExpress 7.0 extends Syncsort’s leadership on this area by delivering:

  • Tighter integration with Hadoop, including Load/Extract capabilities
  • Faster, easier data integration acceleration through bulk metadata import capabilities
  • Improved support for mainframe application modernization projects with expanded JCL SORT  control statement support, and out-of-the-box integration with leading re-hosting solutions

Simple, Agile ETL Development. We believe it should not take you days to install and configure data integration tools. In a matter of days, you should be able to install, develop and deploy your designs into production. That’s why DMExpress 7.0 extends the software’s ease of use, providing features to encourage reusability and collaboration to increase overall productivity.

A couple of months ago at TDWI World in San Diego, Syncsort defined a new approach to ETL, one that would deliver on the long overdue promises of data integration tools. We called this approach ETL 2.0.  Today from TDWI World in Orlando, we announce DMExpress 7.0, the first solution in the market to effectively deliver a completely new approach to data integration, the foundation of ETL 2.0.

DMExpress 7.0 will be generally available in December. In the meantime, stay tuned for more details on our upcoming DMExpress 7.0 release, straight from our R&D labs!

{ 0 comments }

Today, Syncsort announced our strategy and entry into the Hadoop community.  This is really exciting, as our customers have told us they are pushing more and more into Hadoop as “big data” grows in their enterprise and the need to scale becomes even more critical for their businesses.  Our developers are really excited about what Syncsort is doing, as well.  Even though we are an East Coast based company, several of them are even threatening to dye their hair purple…Hadoop purple!

Our announcement has two major components to it.  The first part is that we intend to contribute an external sort “plug-in” to the community.  There have been calls in the past for performance enhancements and other optimizations to sort.  With this contribution, anyone could seamlessly plug their own sort engine into Hadoop by using the published interfaces, including Syncsort’s solution (more details on that below).  With Syncsort’s 40+ years of experience in sorting, we believe we have unique expertise we can apply for the benefit of the larger Hadoop community.

While other data integration vendors are talking about Hadoop, we have not seen any of them embrace the community by making contributions. We believe this distinguishes Syncsort’s entry into the community and hope that it is viewed as a sign of our sincerity and excitement around working with the open source community and customers to truly make Hadoop better and even more valuable than it is today.

 The second part of our announcement is the new DMExpress Hadoop Edition.  Entering a limited availability beta period in June, this new offering will encompass 3 components: 

  1. HDFS connectivity: extract and load HDFS.  We actually can do this today with examples we ship in the product.  If you’re a DMExpress customer, check this out in the online help.
  2. The sort acceleration piece from our contribution (discussed above) to actually improve the sort performance.  Our marketing team (who I think is also dabbling in the purple hair thing) is calling this Hadoop Acceleration.  While we are contributing the plug-in, the actual sort from Syncsort will be this new DMExpress Hadoop Edition.  As you can see from our announcement, we have seen some pretty good performance improvements.  We will continue to benchmark our acceleration throughout the beta period.  Stay tuned to this blog for more results.
  3. The ability to create MapReduce jobs in the DMExpress graphical environment, rather than write Java, Pig scripts, etc.  If you know DMExpress, this is the Task Editor.  If you need to write data transformation, re-formatting of data, aggregations, etc., the user can now use our Task Editor.  DMExpress will automatically deploy on the Hadoop cluster sourcing the HDFS, and running the transformations across the cluster.  Not only is the processing faster, the jobs are much easier to write and maintain.

This is obviously just the beginning.  I am very excited about our announcement today and our entry into the Hadoop community.  We have received overwhelmingly positive feedback from our customers and the industry analysts we have briefed.  Stay tuned for more details and results from our beta testing.  I even promise to post pictures of any Syncsort developers or marketing folks that actually follow through with the purple hair!

{ 0 comments }

As I leave “Vegas for kids” (aka Orlando), I realize that I have seen the light.  That light is metadata or, more specifically, metadata interchange. Metadata interchange is critical for our customers and for Syncsort. As the well-known television commercial says, “this changes everything.”

In my previous post on DMExpress 6.5, I talked about this concept of “snap”.  I want to add some more detail about this here.

The first place to start a DI Acceleration project, or any DI/ETL project for that matter, is to understand the source systems being read and the target system(s) to write.  The metadata Interchange will allow users to import the source and target schemas from DI/ETL tools (Informatica PowerCenter, DataStage, etc.), databases, files, COBOL Copybooks (really ?!?), etc.  This will allow users to match the metadata in a DI Acceleration scenario.  In DMExpress, this corresponds to individual sources and targets (file, pipe, database table, table in memory, etc.) and their associated metadata (delimited field layout, fixed position field layouts, XML schemas, extracted database columns, COBOL copybooks, etc.).  Pretty cool so far!

Enter transformations (fanfare please).

To “jump start” a DI Acceleration project, metadata Interchange will provide the import of transformations performed in an ETL tool into DMExpress.  This will assist in the development of the DI/ETL into DMExpress.  Metadata Interchange will import the transformations by constructing DMExpress Source/Sort/Copy/Aggregate/Join/etc. task(s) and then connect these tasks in a DMExpress job to perform the equivalent processing. Will we get it 100%?  No.  As the saying goes, only death and taxes are a 100% certainty.  However, when you watch this working in our labs…it’s amazing!

Probably the most important piece of metadata interchange is to maintain the data lineage.  Whether users are using DMExpress as a performance augmentation to other DI/ETL tools, or using DMExpress for all ETL, users do not and cannot lose the data lineage for reporting and compliance purposes.  Metadata interchange will provide this data lineage to understand and maintain the data origination, how it was transformed, and allows export of it via the bridge.

Syncsort is using the same “Switzerland” of metadata vendors that almost every other data vendor (DI, BI, modeling, etc.) uses in the industry. As one of the top DI analysts said to us recently, this is “a really big deal” for Syncsort and we believe it makes DI acceleration that much more compelling and credible for the market.

{ 0 comments }