3 Reasons Why Sort is Still King

3 Reasons Why Sort is Still King

I wrote half of this blog at the airport waiting for a flight. They guy across from me noticed the Syncsort sticker on my laptop and said something I have heard hundreds of times in the last 3 years.

“Hey I used to use your product back in the day, it was awesome, I loved it.”

Then he paused and before adding.

“Are you still around?”

Yeah, we’re still around.  Also I know why he forgot about us, but I also know why we’re worth remembering.

Why Everyone Loved Sort

In heyday of the mainframe, data was largely stored in files.  If you wanted to answer a question with all the records in a file, say look for duplications, you had to organize that file and that meant sorting.  Sorting is tough work, it takes a lot of resources and can be slow.  Worse than slow is broken.  If you are not careful with a sort you can easily run out of resources and never finish the computation.  So basically there was a situation where most important questions required a sort (I like to call this a “sort shaped hole” in a design…it’s a natural part of the design), and that sort better be well implemented if you ever wanted it to finish.  Syncsort provided a great sort and everyone loved it.

Syncsort helps organizations around the world drive performance through sort.

Why Everyone Forgot About Sort

So what happened to people caring about sort?  The answer is databases (or more specifically the relational databases, but most people just call them databases).  Large files are not that great for finding and manipulating individual records.  Also while you can design a business process that works against large files, it takes some effort so it’s not great for answering ad hoc questions.  Having a system where you pre organize the data, makes it easy to ask a variety of questions without having to do much process design.   While databases do sort, they are not as reliant on it, and largely hide the sort from the programmer.  Once you start keeping your data in a database you stop thinking about sorting, and that’s exactly what happened. Slowly people started designing processes around databases and other technology and forgot about sorting.

Why Everyone Remembered Sort

However, sorting is still relevant, and in some cases is still the best option for your data processingHere is my list of some of the top scenarios in which sorting is still king.

  1. Sort is best when you are already sorting.
    It’s a lot easier to use a sort package to keep the process design, and much of the code, in place when migrating or updating applications.  This can minimize cost and risk while maintaining or improving upon the performance of the application.  Often application updates involve transitions to Linux or Windows.  The sort can get left out of the new design because it is perceived as a resource from the old environment.  This does not have to be the case.Syncsort currently has sorting packages for all major Linux, Windows and Unix platforms, and these packages are compatible with all old Syncsort code.  If an application is coming off the mainframe, or was already designed with a sort on Unix, Syncsort was very likely the sort used. Sort code written in the late 70s can still work on Syncsort today.
  2. Sort is Best for Batch Processing
    Sorting is great when you are fundamentally doing batch processing.   Face it, not all business processes are real time or queue driven.  Lots of organizations get daily, weekly or monthly batch feeds of data.   Because databases have become so ubiquitous the following work flow emerges:

    • Shove batch data into a database
    • Execute saved SQL scripts against the new tables
    • Export the results
    • Drop or archive the newly created tables in preparation for the next batch.

    This is not the best use of a database, which is designed to organize and persist data.  It incurs DB licensing costs and administrative overhead.  Sort based workflows can perform the same data processing directly on the inbound batch files (Look for a future blog that explains how operations that people think of as SQL, are implemented as sorts).  The results can be sent on to other business processes and the inputs archived or discarded as necessary.

  3. Sort on embedded Linux Streamlines Reporting
    Linux is the embedded operating system in so many devices.  Telecom switches, industrial equipment and scientific equipment are a few examples.  They use Linux because it is robust, reliable and most importantly cost effective.   All of these devices generate log files and sometimes you need to do some basic reporting on these files.  Sorting offers a great way to do basic reporting on the embedded Linux boxes without having to deploy a database.  Sorting engines are generally much more compact than databases and they can work directly on the log files.  Scripts can be developed centrally and then easily deployed to the embedded systems.

These scenarios are what came to mind when I sat down to write this blog, but they are just part of the way I think about sorting.  Every day I work with a different few of the thousands of customers large and small  using our sorting products for key applications.  We are the market leader in sorting and produce the best general purpose sorting package.  You can check out our solutions at www.syncsort.com or contact us with questions.

Peter Coppenrath

Authored by Peter Coppenrath

Manager, Engineering, Sort Development

Leave a Comment