Milk Makes Big Data 4 Times Faster, But It’s No Silver Bullet
Milk, a new programming language designed to optimize data retrieval, can improve big data performance by 400 percent. Does that mean the days of Pig Latin, Julia, Python and other traditional languages are numbered? Keep reading for some insights on Milk and its impact on the big data world.
Milk, which was developed by researchers from MIT and unveiled back in September, brings a simple, but powerful innovation to big data processing. It aims to make the retrieval of data from main memory more efficient in order to improve performance.
How Milk Works
The central idea behind Milk is this: Typically, when an application retrieves a particular piece of data that is stored in memory, it pulls data stored alongside it as well.
In the context of traditional computing, this approach is an advantage. If an application wants to do something with one piece of data, there’s a good chance that it will need neighboring bits of data as well. Retrieving those additional bits at the same time as the main data minimizes the number of requests the system has to handle in order to deliver data to the application.
But what if your application doesn’t actually need all of that neighboring data? What if it really, truly just wants to work with one specific piece of data, and has no interest in other information stored nearby?
In that case, retrieving all of the neighboring data would be a waste of resources. It would be like buying an entire bookshelf’s worth of books when all you really want is to read a single novel, or like renting a twenty-foot U-Haul in order to move out of a small studio apartment.
Milk solves this problem. Applications written in Milk don’t retrieve neighboring data. And they wait until a sufficient number of pieces of data are identified before pulling information out of memory at all.
Milk and Big Data
Software written in Milk can deliver four-fold performance advantages for big data applications, according to the researchers who developed the Milk framework. That’s because, when you’re working with big data – as opposed to more traditional types of computing – it’s often the case that retrieving more data than you specifically need will not prove beneficial.
Consider, for example, an application that makes product recommendations on a website by examining a user’s purchasing history, which is stored in a database. When the application wants to make a recommendation, it needs the purchasing history data associated only with the specific user in question. Pulling the data from other users’ accounts that happen to be stored close to the same location in the database table will just be a waste of time and computing power.
This is true, by the way, whether the data you’re working with is stored on disk or in memory. Milk promises big performance gains for both types of big data computing.
Milk could also be especially useful when working with unstructured data, like the type usually stored in NoSQL databases. Even more than information in relational databases, the location of a given piece of unstructured data within a NoSQL database is unlikely to have any meaningful relationship to other data stored nearby.
Is Milk the Killer Big Data Language?
So, if Milk makes big data processing so much more efficient, is it poised to make other big data languages – like Pig Latin, Julia, Hadoop and Python – obsolete?
The short answer is no. The world of the big data is big enough for Milk to live alongside these earlier comers to the big data party.
After all, Milk is not a silver bullet. It can deliver better performance under certain types of conditions, as described above. But it’s not always going to be the best tool for the job.
The other languages all have their strengths. Pig Latin has the advantage of being very simple. Python is a highly extensible and easy-to-read language – not to mention one that is widely used outside of the big data world, too, making it convenient for people who don’t want to learn a new programming framework just to write big data software. Julia is a very complete and sophisticated language, which also boasts great performance in many cases.
Plus, performance is not the be-all, end-all of all big data applications. (If it were, no one would be using Python – which is, generally speaking, a pretty slow language – for big data.) So Milk’s performance advantages won’t automatically prompt data scientists to flee in droves from the frameworks they have been using for years.
To learn more about what is trending in Big Data, check out Hadoop Perspectives for 2017. This free eBook summarizes the results of Syncsort’s third annual Hadoop survey, uncovering the trends in Big Data to watch for in 2017!
A Holistic Strategy for Big Data
Milk is a great new tool to add to your big data arsenal. But a highly successful big data operation requires having a broad array of different solutions in place and integrating them well, in order to meet whichever challenges you face.
In addition to having different programming languages at your disposal, you also want solutions that help you move data between different types of systems easily, such as the ones Syncsort provides.