Open Data is Great -- But Only If You Ensure Data Quality

Open Data is Great – But Only If You Ensure Data Quality

Open data is all around us these days, which is a great thing. To leverage open data effectively, however, you need to be prepared to address the data quality risks. Here’s why and how.

Defining Open Data

The open data movement takes its cues from the open source software movement.

Open source software refers to programs whose source code is available for the public to download, inspect, modify and, if desired, expand.

Open Data is Great -- But Only If You Ensure Data Quality

In a similar fashion, open data means sets of databases that anyone can access and use as they wish.

Open data is usually free of cost, although that is not the defining characteristic. Openness – that is, the quality of being openly accessible to anyone – is what makes open data what it is.

Government agencies have become one of the leading sources of open data. Governments like New York City and the federal government of the United States make data sets freely available online.

Scientific research projects also sometimes provide open data sets. The Human Genome Project makes a range of important data sets freely available, for example.

Open Data is Great -- But Only If You Ensure Data Quality

Why Use Open Data

Simply, it is a great resource. Companies can and should take advantage of open databases when the data fit their needs. In many cases, doing so is a fast and cost-effective way to gain access to data that can drive analytics engines and deliver important insights.

For example, imagine that your company wants to know what kind of public Wi-Fi infrastructure is available to customers to help predict how much bandwidth the company can expect an app to support for those customers. If the customers happen to be living in New York City, the company can grab open data related to Wi-Fi availability for residents. That’s a lot faster and easier than compiling all that data from scratch.

Webcast in Partnership with ASG: Gain Clarity and Confidence in Your Data - Benefit from Better Data Quality Today!

Open Data and Data Quality

As great as open data is, it comes with a caveat. In some cases, it may not provide the data quality required to make the data actionable.

This isn’t because most open data sets are inherently low in quality. The fact that they are (usually) free does not mean you can’t trust the data inside them. This may be the case with some open databases, but most open projects provide data that is as reliable as any you collect yourself. (Indeed, you can make the argument that because open data sets are available for anyone to inspect, they are likely to have fewer errors, because there are more people to notice that something is wrong.)

Still, no data set is perfect, and open databases are no exception. Take the open database related to Wi-Fi in New York City. The database includes the street address for each Wi-Fi access point, along with latitude and longitude coordinates. If it is important for you to know for certain exactly where each Wi-Fi access point is located, you’d want to cross-check this information to make sure the street addresses align with the map coordinates.

You’d also probably want to make sure that all the street addresses actually exist. Data entry errors, address changes or other problems could easily introduce flaws into this part of the database.

Data quality tools – including Trillium data quality solutions which are now part of Syncsort’s suite of Big Data solutions – can help you perform the checks you need to identify, and fix potential data quality errors like these.

ASG and Trillium Software recently hosted an educational webcast that explored the need for improving data quality, as well as some common challenges:  Watch the replay now!


Take our 2017 Big Data Trends Survey! Spend 5 minutes to earn a $5 Starbucks gift card

Christopher Tozzi

Authored by Christopher Tozzi

Christopher Tozzi has written about emerging technologies for a decade. His latest book, For Fun and Profit: A History of the Free and Open Source Software Revolution, is forthcoming with MIT Press in July 2017.

Leave a Comment