Using Big Data to Fight Fake News
No matter where you fall on the political spectrum, you likely know that “fake news” is a problem. Big Data can help solve it. Here’s how.
Fake news refers to content that presents itself as a news article but may contain deliberately fictitious or misleading information.
Fake articles have always been around, but the rise of social media has made it more difficult for readers to distinguish reliable, objective information from less trustworthy content. In the old days, it was easy to tell a tabloid apart from a major newspaper, but on Facebook, all articles can look the same.
Existing attempts to combat fake news have involved crowd-sourcing efforts, wherein users are asked to vote on whether they believe a news item is fake or not. This approach is better than nothing, but it doesn’t work well in situations where different groups of users hold strongly varying opinions over whether a news article should be promoted or not – as is the case with many political items, for example.
Another strategy is to have human reviewers decide whether a news item is fake. This is also subject to the problem of possible bias on the part of the people making the decision. It’s also a solution that doesn’t scale well because the amount of fake news that you can flag is limited by the number of employees you have available to dedicate to the task.
Big Data as the Solution to Fake News
There’s a better way. By leveraging Big Data and data analytics, content publishers can identify fake news in a manner that doesn’t break down when politics are involved, and that can scale seamlessly.
A data-driven strategy for finding fake news is simple enough to implement. You first need a body of data associated with fake news. Collecting this information may be the hardest step in the process, as it might require manual compilation of fake news articles from which to build a foundation.
Once you have such a dataset in place, you can use data analytics to see which trends fake news articles share. Spelling and grammar mistakes may be one example: “News” that is written by people deliberately trying to mislead readers, rather than legitimate journalists, is probably less likely to be well written.
Other data points that could drive analytics include information such as:
- The number of references that exist in an article. Articles containing fewer references may be more likely to be fake, since it’s hard to find reliable external sources if you’re making information up.
- Whether articles are attributed to writers who appear to be real people. To determine whether a purported author is real, you could search databases to see how many other articles appear under a given writer’s name.
- Whether an article contains copy from another article that has already been identified as fake. This would be a solid sign that the fake news-producers plagiarized themselves by using the same text in multiple fake articles.
To be sure, none of these methods are rock-solid. If someone wants to create fake news badly enough, he could insert lots of seemingly legitimate external links into an article. He could give the byline to someone known to be a real journalist (without asking permission). He could make sure that all his fake news articles are totally original, and that they don’t borrow text from each other.
But doing these things requires more work. If you can use data to make it harder to create fake news that passes as legitimate, you’re already doing much to help win the war against fake news.
Data-driven methods for identifying fake news are already being deployed. Facebook is already reportedly experimenting with the idea, although for the time being the company seems to be relying more on professional moderators to root out fake content. First Draft News is also using data analytics, in conjunction with other methodologies, to help identify inaccurate information online.
While there may be no perfect antidote to fake news, Big Data and data analytics can help to stem the fake news flood. This is just another example of how data is reshaping the way the world runs.
The world of data is also changing. See how it impacts how data is moved, manipulated, and cleansed – download Syncsort’s eBook The New Rules for Your Data Landscape today!