How big is Big Data?

Just How Big is Big Data, Anyway?

You know that Big Data involves lots of data. But have you ever stopped to think about just how much data, exactly, goes into Big Data? In other words, how big is Big Data, actually?

Defining Big Data

Before delving into the question of how big data has to be in order to be considered Big Data, let’s discuss the difficulty of defining what it actually means.

There is no official definition of Big Data, of course. What one person considers Big Data may just be a traditional data set in another person’s eyes.

That doesn’t mean that people don’t offer up various definitions for Big Data, however. For example, some would define it as any type of data that is distributed across multiple systems.

In some respects, that’s a good definition. Distributed systems tend to produce much more data than localized ones because distributed systems involve more machines, more services, and more applications, all of which generate more logs containing more data.

Download our eBook: The New Rules for Your Data Landscape

On the other hand, you can have a distributed system that doesn’t involve much data. For instance, if I mount my laptop’s 500-gigabyte hard disk over the network so that I can share it with other computers in my house, I would technically be creating a distributed data environment. But most people wouldn’t consider this an example of Big Data.

Another way to try to define Big Data is to compare it to “little data.” In this definition, it is any type of data that is processed using advanced analytics tools, while little data is interpreted in less sophisticated ways. The size of the actual data sets isn’t important in this definition.

This is also a valid way of thinking about what Big Data means. The big problem with this approach, however, is that there’s no clear line separating advanced analytics tools from basic software scripts. If you define Big Data only as data that is analyzed using Hadoop, Spark or another complex analytics platform, you run the risk of excluding from your definition data sets that are processed using R instead, for instance.

So, there’s no universal definition, but there are multiple ways to think about it. That’s an important point to recognize because it highlights the fact that we can’t define it in quantifiable terms alone.

How Big is Big Data? Big Data Examples

Examples of Big Data

What we can do, however, is gain a sense of just how much data the average organization has to store and analyze today. Toward that end, here are some metrics that help put hard numbers on the scale of Big Data today:

All of the above are examples of sources of Big Data, no matter how you define it. Whether you analyze these types of data using a platform like Hadoop, and regardless of whether the systems that generate and store the data are distributed, it’s a safe bet that data sets like those described above would count as Big Data in most people’s books.

Related: 5 Industries Being Reshaped by Data Analytics

The Big Data Challenge

It’s also clear that the data sets represented above are huge. Even if your organization doesn’t work with the specific types of data described above, they provide a sense of just how much data various industries are generating today.

To work with that data effectively, you need a streamlined approach. You need not just powerful analytics tools, but also a way to move data from its source to an analytics platform quickly. With so much data to process, you can’t waste time converting it between different formats or offloading it manually from an environment like a mainframe (where lots of those banking, airline and other transactions take place) into a platform like Hadoop.

That’s where solutions like Syncsort’s come in. Syncsort’s data integration solutions automate the process of accessing and integrating data from legacy environments to next generation platforms, to prepare it for analysis using modern tools.

But no matter how you define it, Big Data is in a state of evolution. Discover how the new data supply chain impacts how data is moved, manipulated, and cleansed – download the new eBook The New Rules for Your Data Landscape today!

 

Christopher Tozzi

Authored by Christopher Tozzi

Christopher Tozzi has written about emerging technologies for a decade. His latest book, For Fun and Profit: A History of the Free and Open Source Software Revolution, is forthcoming with MIT Press in July 2017.

0 comments

Leave a Comment

*