You know Hadoop. You know Spark. But do you know the rest of the open source Big Data world? If not, this post is for you: Keep reading for a roundup of important open source Big Data platforms you may not have heard of.
Names like Hadoop and Spark – not to mention the likes of Kafka, Sqoop and Storm– have dominated the Big Data conversation for years. They were among the earliest Big Data platforms to make a big splash and there’s little doubt that their popularity will continue.
But the open source Big Data ecosystem is much bigger than just a few projects.
5 Open Source Big Data Projects to Watch
Here’s a list of other open source tools for collecting, storing, ingesting and analyzing data that are worth watching.
Samza is a framework for streaming data processing. It’s similar to Spark Streaming, but different in a few ways. Probably the biggest is that Samza is event-focused, whereas Spark is batch-focused. Samza also sports a modular API that makes it possible to extend the framework to support messaging systems other than Kafka (which Samza uses by default) and a variety of execution environments.
Hadoop applications that run on a distributed environment perform very well, but they are complicated to write. Twill aims to make programming distributed applications easier. Its promise is that it allows programmers to focus on their applications themselves, rather than figuring out how to work with a distributed architecture.
Like Samza, Flink is another streaming data framework. It wants to be the new Spark, and it just might do it.
Go, a relatively new programming language, has a sizeable following in general. But Go has not received much attention from the Big Data community. That may now be changing, as companies like Intel are becoming keen on promoting Go for Big Data applications. If you’re a data scientist and haven’t yet given Go a shot, check it out. It’s not going to replace R, but you just might like it more than, say, Python, if that is what you are currently using for general scripting.
Collecting and analyzing data from sensors, smart devices and other data sources on the Internet of Things (IoT) will become increasingly important as the IoT expands. Kaa is an open source platform that aims to give developers one-stop shopping for writing software for IoT application. Data collection and analytics is one of its core features. While platforms like Hadoop will no doubt be important for IoT as well, Kaa offers another option.
Download Syncsort’s eBook, Bringing Big Data to Life: What the Experts Say, for more insights on the future of open source Big Data projects.