Crossing the Data Integration Tools Chasm
Recently, I came across a great article by Rick Sherman in the August issue of TechTarget’s BI Trends and Strategies (starts on page 7). In his piece – Misconceptions Holding Back Use of Data Integration Tools – Rick makes a very strong case for the benefits that data integration tools can bring to organizations, as well as some of the key misconceptions that have hindered wider adoption. TechTarget has also published a great Q&A with Rick on the topic that you can read here.
Rick points out that despite the advancements in functionality and performance of data integration technology, many organizations are still using manual coding for their DI projects. Without giving away too much, Rick mentions two key reasons: a lack of understanding about the capabilities of today’s data integration tools and the perception that SQL can solve all DI tasks.
I agree with both of these statements. I would also argue that, even today, the large majority of DI tools have failed to deliver on their promises of faster performance and increased IT productivity.
Over the years, many DI vendors have focused on acquiring a vast array of disparate technologies. As a result, they’ve added tons of functionality but have also made their stacks heavier, more complex, and very difficult to understand and use. This is especially true when there’s a need for high-performance data integration. Most DI tools require significant skills and ongoing manual tuning to achieve and sustain acceptable performance, creating the same ever-expanding IT backlog. Therefore, it’s hard for IT professionals to justify the investment and realize its full potential. Instead, they often revert to what they know best. In this case, that tends to be coding SQL.
Unfortunately, as Rick mentions, SQL coding is far from being the answer to all DI tasks. As I explained in a previous post on ETL vs. ELT, SQL was originally designed to solve problems that involve a “big” question with a “small” answer (i.e. analytical queries). However, when dealing with ETL problems, the answer is generally as big, if not bigger, than the question. Therefore, in most cases, SQL is not the best approach for DI tasks. The fact that companies still prefer to code SQL in spite of this may point to another problem. Perhaps organizations do not feel the need for much of the functionality packed in many of today’s DI platforms. The fact is, core ETL still dominates most of the DI challenges that organizations face today. Core ETL is still their most fundamental issue.
Syncsort, with our DMExpress solutions, has identified these and other challenges organizations commonly face when performing data integration. This is why our focus is on providing the fastest performance at scale without the need to write complex SQL or constantly tune the DI environment. We package all of this within a very small software footprint. In fact, as my colleague Keith Kohl has shown in his “proof is the pudding” blog series, we’ve been helping a lot of organizations to move away from complex, inefficient SQL coding to fast, efficient, simple and cost-effective data integration.
In the end, I agree with Rick. The first step for organizations is to acknowledge the problem. I believe the era of Big Data might be the catalyzer that will finally push organizations to rethink their DI strategies. Maybe Big Data can start to signal the end of manual SQL coding? For more on that, you can read my thoughts in a recent article I wrote for IT Briefcase.