Data infrastructure optimization, availability & security software
Data integration & quality software
The Next Wave of technology & innovation

Data Warehouse Offload: Monetizing Big Data Initiatives

Such an exciting week for Syncsort with our open source initiatives, new release announcements, and our sponsorship and participation at the Hadoop Summit!

This week, we announced several Big Data solutions with a rich set of tools helping businesses unlock the value of their data, whether it is in legacy mainframe or data warehouse storage.

Many organizations are challenged with containing growing costs or meeting continuously changing businesses requirements while tapping into corporate data assets. These challenges are also the major drivers for the evolving reference architecture for the data warehouse, leading to adoption of Big Data technologies, in particular Hadoop. To move beyond economic and architectural limitations, organizations are offloading expensive workloads and associated data to Hadoop.

syncsort_ProjectSILQSyncsort’s solutions offer powerful tools to help with this migration of workloads. We announced project ‘SILQ’ this week, a new utility that helps with analyzing existing SQL (ELT) jobs and provides a visualization of the data flow. Many of the SQL scripts in the traditional data warehouse environment have been written a number of years back and it is challenging to even understand what they do. As we worked with our customers, we saw how difficult this was for them and partnered to offer a new tool. In addition to parsing and analysis, this tool also makes recommendations on how to create the same data flow within Hadoop with Syncsort’s Hadoop ETL solution. Project ‘SILQ’ is SQL-92 complaint with BTEQ extensions and is in private Technical Preview. Extensions for other SQL dialects are in the plan.

Syncsort’s new release of ETL products (DMX/DMX-h) also includes optimizations for faster analytics, providing end to end solutions for the modern data architecture. Syncsort products are now certified with HP Vertica Analytics Platform 7. In addition to providing benefits for data integration from variety of sources including Teradata, web logs, Mainframe, Syncsort’s ETL products provide seamless integration for high performance parallel load to HP Vertica nodes. The data is automatically partitioned and multiple streams are loaded in parallel into different Vertica nodes. The number of streams are dynamically adjusted based on the speed and volume of data produced by the ETL processing as well as on the number of Reducers when run in Hadoop MapReduce.More data is pushed to the Vertica nodes that are receiving data at a higher speed. All of the optimizations, data partitioning, adaptive Vertica cluster capacity utilization and dynamic load balancing are automated and do not require any manual tuning.

We worked closely with the HP Vertica team to provide best-of-breed solutions for our joint customers. We’ve observed customers using the concept of an ephemeral node for resourcing different types of workloads, namely data loads and analytics queries, When some nodes are marked as ephemeral, Vertica automatically rebalances the database and the processing and memory resources on these nodes do not get used by the queries. Syncsort’s new release gives users the option for using only ephemeral nodes for loading data to HP Vertica, without impacting the query performance or the query latency on the cluster.

Overall, this tighter and optimized integration simplifies tapping into corporate data assets and provides higher scalability and more efficient use of resources.

We are fully focused on providing highly optimized solutions with a simple interface that takes care of the performance tuning without turning hundreds of knobs, and increasing adoption of big data technologies within the organizations while leveraging existing skill sets. The new releases and tools continue to deliver on operational efficiency, accelerating time-to-insight and reducing total cost of ownership for big data analytics, while helping to offload expensive workloads from traditional data warehouses to Hadoop.

Related Posts