business users

Earlier this year, I kicked off the “proof is in the pudding” blog series as a way to share results that DMExpress is achieving during proof of concepts (POCs) in real customer environments. The idea is to wow the loyal readers of the Syncsort blog with information about DMExpress’ speed, efficiency and ease of use.  

It has been too long since I contributed to the series, but I promise to start posting more frequently. We’ve got a lot of exciting work going on behind the scenes and plenty of information to share.

For this post, I want to focus on a recent POC involving a customer that was running up against their nightly batch window. If there was any failure at all during the evening, the customer would not be able to refresh the data warehouse leaving business users with data that is 24+ hours old. This was simply not acceptable to the business and we knew that DMExpress was just the right solution for the job.

For this POC, the environment consisted of a four-core UNIX box with ETL coded in PL/SQL (while that’s really ELT, please forgive the semantics for right now).  Another challenge this customer had was that this particular ETL flow involved nearly 900 lines of PL/SQL which was incredibly complex and nearly impossible to maintain. In fact, they really only had one person capable of maintaining it. What happens if he goes away? Hopefully this doesn’t sound too familiar to you!

The stated goal of the POC was to reduce elapsed processing time by 33%. Additionally, we were looking to demonstrate that DMExpress could significantly reduce the complexity of building and maintaining the ETL.

The particular job involved 5 data sources, identifying changed records, performing multiple joins, enhancing the information via lookup, and loading the database.  The POC ran on approximately 350,000 records, a relatively small amount of data. However, as you are about to find out, the results were quite impressive!

The original process was taking 90 minutes, so the 33% reduction that the POC targeted meant that we had to reduce it to 60 minutes. How did DMExpress do? How about only 6 minutes! That’s a 15x improvement in throughput and 93% reduction in elapsed time for those of you keeping score at home.

How about the 900 lines of PL/SQL? We took that and converted it into just 2 DMExpress jobs, now built and able to be maintained in a simple, easy-to-use graphical user interface. Needless to say, the customer was impressed.

Stay tuned for more results in the days and weeks ahead. In the meantime, don’t be shy about posting comments and questions.

We are also still open to taking on any challengers willing to put their solutions up head-to-head versus DMExpress in a benchmark. Of course, with results like the ones I’ve shared above, I guess it’s not a big surprise that we haven’t had any takers on that just yet…


Thanks to a recent post by my colleague Steven Totman, some of you may already know that I had the chance to participate on a panel about “The Breakpoints of Big Data” at the FIMA conference in London. One thing that really caught my attention during the entire conference was the number of attendees from the business side of the house. I think this represents further evidence that the information technology landscape continues to change dramatically. More than ever, the business stakeholders are willing to get their hands dirty to learn, participate and immerse themselves in the world of data management and information technology.

Some IT organizations may feel threatened by this trend and may choose to ignore it or minimize it. However, many others are already embracing it with significant benefits to the business. Ultimately, true collaboration between business users and IT empowers companies to make sense of ‘Big Data.’ I believe this holds true for most aspects of IT, especially those that are more strategic like analytics, business intelligence and, of course, data integration.

By enabling greater levels of collaboration between business users and IT, organizations can make a great leap forward and unleash the opportunities of ‘Big Data’ by:

  • Bringing together the right set of skills to understand the data and focus on the right areas
  • Accelerating development cycles, providing the business with the agility it needs to capitalize on opportunities faster than the competition
  • Understanding the business impact of diverse data services to prioritize resources and work towards a common goal
  • Generating more knowledgeable and satisfied business users, resulting in greater ROI and user adoption

That said, there is no perfect world. The new dynamics will require proper tools that allow and facilitate greater levels of collaboration, self-service, and reusability. That is why these concepts are core to Syncsort’s ETL 2.0 approach. Even then, there will always be sources of friction and areas of inefficiency. In this regard, it is really not any different than a couple that has been married for a long time.

The business user has arrived and there is no turning back. The good news for IT is that they just might have found the perfect ally for taming ‘Big Data’ and capitalizing on the opportunities it presents to organizations of all sizes.


ETL 2.0: A New Beginning

August 8, 2011

Data Integration tools, as we know them today, are failing. Ten years ago, they promised a simple way to load data from multiple, disparate sources; transform it into critical insights and load it into a common repository where business users would leverage it for competitive advantage.

The truth is, most data integration tools can no longer cope with the increasing demands for information. Big Data and mobile computing are obviating the shortcomings of most data integration and ETL tools. A new approach is needed, a new beginning.

That is why today, Syncsort is announcing a new concept that promises to fundamentally change the way people do data integration. We call it ETL 2.0.

ETL 2.0 is about bringing the “T” back to ETL. It’s about realizing the insanity of ELT: a practice that costs millions of dollars a year in database capacity and IT staff productivity (for more details you can read my previous entries Friend or Foe: A Tale of Big Data and Data Integration and ETL vs. ELT: A Tale of Staging and Set Based Processing). We believe data integration tools should be more than just expensive schedulers that push all transformations down to the database. But to do that you need fast, efficient, in-memory engines to deal with the demands of Big Data.

ETL 2.0 is also about bringing the business user closer to their data. During the past 10 years, Business Intelligence tools have done a great job of closing the gap between the user and IT. Reusability, self-service, collaboration, and most recently Mobile BI have all contributed to this effect. Unfortunately, DI still remains largely isolated from the business users; communication is minimal, collaboration almost non-existent. It is time to close that gap and empower the business users by involving them early in the design and throughout the entire development process. Only then, IT organizations will adapt and respond on time to business demands and user satisfaction will increase.

In the end, ETL 2.0 is about achieving strategic business objectives and lowering the total cost of owning and maintaining your data integration environment.

From TDWI World in San Diego, this is our vision for a new beginning. We hope you find it interesting! If you do, please stay tuned for more details…

{ 1 comment }