Offload the Data Warehouse to Hadoop and Save Millions
The Data Warehousing Institute (TDWI) recently released a white paper and survey called Integrating Hadoop into Business Intelligence and Data Warehousing. It quoted an IT director telling a story we hear frequently. In fact, it is a trend.
“We know the things each platform (relational database EDW, data warehouse appliance, and Hadoop) does well, and we know approximately how much managing a terabyte of data costs on each. Plus, we understand how the value of data varies from one data set, source, or topic to the next. So we manage big data and other data on the cheapest platform that will get the job done. As we get better at this, we find ourselves managing and processing more data on HDFS and the appliance, and less on the EDW. We’ve even reduced the EDW license—much to the chagrin of the vendor!”
̶ Rick Miller, Sr. Director, AOL
As data volumes have grown exponentially, ETL tools have become more like schedulers that push down resource consuming transformations into the data warehouse, performing ELT instead. Plus, large amount of data loaded and transformed are unused by the business. Dormant data is not only taking up storage capacity, but more significantly is impacting the processing capacity consumed in terms of CPU and I/O that is wasted on running ELT on the data warehouse for data that is unused.
For many firms, if a platform exists that can ‘get the job done’ and is 1/10th the cost, today’s economic environment will force a company’s hand. Today, most types of data transformations can be done on Hadoop faster, in a more scalable manner and at significantly lower costs. The emerging model is to offload data transformations to Hadoop, store and maintain relevant data on Hadoop, and load only the needed transformed data to the enterprise data warehouse (EDW).
Other firms are already migrating enterprise data to Hadoop, in all likelihood. If they are not testing it now, they will be soon, as “…a whopping 51% say they’ll have [a production Hadoop implementation] within three years”. The time to start is now. Why let your competitors leverage lower IT costs leading to higher net margins? Early adopters and fast followers have already commenced. At this point, starting to migrate unused data is the right step to be in the middle of the herd.
There is even a benefit to the EDW or the Data Warehouse Appliance in all this. The TDWI study expressed it like this: “Some early adopters offload as many workloads as they can to HDFS and other Hadoop technologies because they are less expensive than the average DW platform. The result is that DW resources are freed for the workloads with which they excel”. The data warehouse appliances on the market are amazing pieces of technology. They are purpose-built for running complex analytics at high speed on huge volumes of data. They are often game changers. They only appear slow and use large amounts of CPU when they are asked to do things for which they were not designed, e.g. data transformations. Moving transformations to Hadoop actually allows your data warehouse appliance to return to prominence and do what it is best at doing.
The #1 hurdle found in a recent poll was a challenge “determining which data to store and process on Hadoop”. Appfluent’s software analyzes workloads and data usage to guide customers on what data to store and process on their EDW and what should be offloaded to Hadoop. A leading industry analyst, explaining why Appfluent is an intuitive investment said, “Companies who believe in running their core business in a fact-based way should want to run their data warehouses in a fact-based way. That’s why they want Appfluent.”
Appfluent is the only company that can completely analyze how EDW data is used. It enables large enterprises across various vertical industries to reduce costs and optimize performance on that EDW through accessing usage. Appfluent Visibility is a software product which gives you the ability to asses and analyze expensive transformations and workloads as well as identify unused data – that can serve as the blueprint to begin the process of offloading your data warehouse to Hadoop. The product non-intrusively monitors and correlates users’ application activity and ELT processes—with data usage and the associated resource consumption.
Once you’ve identified just how to be efficient, what next? Syncsort’s partnership with Appfluent is a natural extension of where to go next. Syncsort’s DMX-h product provides the fastest sort technology and the fastest data processing engine in the market, and most recently released the first truly integrated approach to extract, transform and load data with Hadoop and even on the cloud.
Learn more about offloading your data warehouse to Hadoop and see a demonstration of Appfluent and Synsort from our recent webinar.
Written by Santosh Chitakki, VP of Products, Appfluent
—————————————————- “Integrating Hadoop into Business Intelligence and Data Warehousing,” by Philip Russom. 2013. The Data Warehousing Institute. www.tdwi.org
 Op cit.