Beyond Hadoop: 5 Open Source Big Data Projects to Watch

Beyond Hadoop: 5 Open Source Big Data Projects You May Have Missed

You know Hadoop. You know Spark. But do you know the rest of the open source Big Data world? If not, this post is for you: Keep reading for a roundup of important open source Big Data platforms you may not have heard of.

Beyond Hadoop: 5 Open Source Big Data Projects You May Have Missed

Names like Hadoop and Spark – not to mention the likes of Kafka, Sqoop and Storm– have dominated the Big Data conversation for years. They were among the earliest Big Data platforms to make a big splash and there’s little doubt that their popularity will continue.

But the open source Big Data ecosystem is much bigger than just a few projects.

5 Open Source Big Data Projects to Watch

Here’s a list of other open source tools for collecting, storing, ingesting and analyzing data that are worth watching.

1. Samza

Samza is a framework for streaming data processing. It’s similar to Spark Streaming, but different in a few ways. Probably the biggest is that Samza is event-focused, whereas Spark is batch-focused. Samza also sports a modular API that makes it possible to extend the framework to support messaging systems other than Kafka (which Samza uses by default) and a variety of execution environments.

2. Twill

Hadoop applications that run on a distributed environment perform very well, but they are complicated to write. Twill aims to make programming distributed applications easier. Its promise is that it allows programmers to focus on their applications themselves, rather than figuring out how to work with a distributed architecture.

Download Now: Bringing Big Data to Life - What the Experts Say

3. Flink

Like Samza, Flink is another streaming data framework. It wants to be the new Spark, and it just might do it.

Related: Expert Interview: Ted Dunning of MapR on Advantages and Use Cases of Apache Flink and Apex

4. Go

Go, a relatively new programming language, has a sizeable following in general. But Go has not received much attention from the Big Data community. That may now be changing, as companies like Intel are becoming keen on promoting Go for Big Data applications. If you’re a data scientist and haven’t yet given Go a shot, check it out. It’s not going to replace R, but you just might like it more than, say, Python, if that is what you are currently using for general scripting.

5. Kaa

Collecting and analyzing data from sensors, smart devices and other data sources on the Internet of Things (IoT) will become increasingly important as the IoT expands. Kaa is an open source platform that aims to give developers one-stop shopping for writing software for IoT application. Data collection and analytics is one of its core features. While platforms like Hadoop will no doubt be important for IoT as well, Kaa offers another option.

Download Syncsort’s eBook, Bringing Big Data to Life: What the Experts Say, for more insights on the future of open source Big Data projects.

Christopher Tozzi

Authored by Christopher Tozzi

Christopher Tozzi has written about emerging technologies for a decade. His latest book, For Fun and Profit: A History of the Free and Open Source Software Revolution, is forthcoming with MIT Press in July 2017.

0 comments

Leave a Comment

*