Data infrastructure optimization, availability & security software
Data integration & quality software
The Next Wave of technology & innovation

The Customer is Always Right: partnering to extract Big Data value

When developing new features at Syncsort, we always prefer to work closely with our customers. From getting their input for prioritization, to partnering with them on design of new functionality, our interactions with customers are one of the most rewarding and important parts of our job.

During a recent discussion with one of them, we had the pleasure of hearing great feedback about a new feature we delivered. The feature in question provides the ability to partition target files by specifying the size of the partition after compression is applied.

Why is the new feature important?

This customer processes trillions of records a day. Compressing the data saves a lot of I/O and ultimately time on the clock. And with Hadoop, this data can be processed in parallel, as long as it is stored in self-contained partitions. Using the best compression algorithm is important to make the data as compact as possible. And the fewer and fuller the partitions, the fewer the number of mappers needed to process the data.

Our customer knew the optimal size of each partition of compressed data, but making sure that this optimal size was achieved was not trivial. The customer had created some heuristics based on the typical compression ratio of their data, but this was a complex solution that yielded less than ideal results.

With Syncsort’s Hadoop ETL Solution (DMX-h), the customer can now simply define the ideal partition size for the compressed data and let our software do the hard work. When DMX is running, it will figure out how to split the data so that each chunk has the optimum size after compression.

The feedback has been very positive. The heuristics are no longer needed and Syncsort developers can take pride in helping ensure the success of an enterprise that relies on extracting value out of their Big Data.

define the ideal partition size

Related Posts