Dark Data: What It is and How to Deal With It
Dark data — the name sounds chillingly ominous, doesn’t it? Like the Dark Web, where evildoers of all sorts are rumored to be plotting our demise at this very minute; or the Dark Sith Lord, scheming for centuries to eliminate the heroic and selfless Jedi Knight. Relax. Dark data isn’t likely to be the cause of the collapse of modern civilization, but it could cause your company some grief if you don’t understand what it is and how to manage it. Here’s the deep, dark data scoop.
What Dark Data is
They’re waiting for an answer, but it’s all bundled up alone in the dark.
Dark data is all of the unstructured, untagged, and untapped data that sits on your hard drives, silos, and servers. It takes many forms — notes salespeople made on calls with clients, audio and video files, PowerPoint presentations, Excel spreadsheets, and the ginormous piles of text files your users create on a daily basis. Dark data accounts for about 80 percent of the data stored by the average company.
Often, it is redundant; always, it is bulky. Since storage costs have plummeted to such affordable levels as to almost be a non-issue for most enterprises, it doesn’t cause anyone much lost sleep. The term came into vogue in 2012, when Andrew White of Gartner blogged about it, and has taken hold as one of the many tech buzzwords that circulate like blackbirds over a corn field, such as “smart data,” “predictive analytics,” and “the Internet of Things.”
How Dark Data is Dangerous
Shedding the light on your dark data can prevent some real nightmares.
Unfortunately, since dark data isn’t readily accessible, searchable, or usable in the grand scheme of big data integration for business intelligence, it’s essentially wasted. That is, unless it winds up in the wrong hands. This data almost always contains information on intellectual property, confidential proprietary information, and personal identifiable information on customers. Not only can dark data become a legal quagmire if it’s leaked or stolen, it can damage the company’s reputation. Clients, vendors, and employees don’t readily forgive companies that allow their sensitive information to get loose and drag them through the murky mires of identity theft, lawsuits, or intellectual property theft.
How Dark Data is Wasteful
Less terrifying, but perhaps just as costly, is how all of the information you can’t access, analyze, and use to develop helpful business intelligence gets wasted. Dark data contains a plethora of handy information that can be used by the entire company. The sales and marketing teams could use all of those sales notes to develop better campaigns. Human resources could benefit from having information on employee productivity that lie in management notes, as well as data on how the benefits programs are working. Finance could use data from those spreadsheets to cut costs and perhaps make better capital investment recommendations. Research and development, production, and top executives could all benefit from illuminating the vast reservoirs of information bound up in that dark data.
How to Deal With Dark Data
Dealing with dark data isn’t impossible, but it isn’t pretty either. It starts with changing how that data is stored, so that data of this sort can be analyzed and put into useful form from now on. Content needs to be preconditioned so that it is searchable and analyzable later. The second step is data cleansing and reformatting all of the existing dark data. It involves dealing with that massive chunk of unstructured data that’s just sitting there by eliminating any useless, corrupted, or redundant data (such as the 18 versions of last year’s presentation on company profits). With the right tools and infrastructure in place, dark data can be integrated into a useful part of your big data plans.
In this webcast, Big Data experts from HP Vertica & Syncsort will explain – and demonstrate – how you can shine a light on all your dark data to drive more insights, faster and cheaper.