What’s the Cost of Storage on Hadoop?
The Big Data revolution we are participating in isn’t really about collecting terabytes and petabytes of data. We’re doing that and we’ve been doing that for years. The digital tape archives at banks and insurance companies contain years of detailed information. There is data aplenty.
The Big Data age’s need for online data storage poses a conundrum for system and network managers.
Data: The Quick and the Dead
What’s special about Big Data is that the information isn’t archived. It’s there, online, ready for inclusion in schedule reports and ad hoc data-mining expeditions. Big Data isn’t just about having a lot of data. It’s about having that data be live.
That’s why Apache Hadoop is at the center of the Big Data revolution. The Hadoop Distributed File System allows large and small companies to deal with huge amounts of data inexpensively, on clusters of commodity hardware components — and to access that data any time, live.
How Much Can You Save?
How inexpensive is Hadoop-based storage compared to traditional models?
Hard numbers are difficult to come by. Corporations keep their Big Data strategies close to their vests, and mainframe vendors don’t publish price lists — if you want to know what it will cost to add 20 terabytes of storage to your mainframe, you’ll have to sit through lunch with a sales rep before you get a quote.
But some general numbers are available.
Cloudera vice-president Charles Zedlewski recently disclosed some eye-opening figures in an InformationWeek interview. Zedlewski is in a position to know these numbers because he has participated in Hadoop projects with customers.
Zedlewski says the overall price of a Hadoop-based system, including hardware, software, and other expenses, comes to about $1,000 per terabyte.
The Hadoop Advantage
How does that compare? Zedlewski says that traditional network storage solutions cost about $5,000 per terabyte, and sometimes two or three times that much. Legacy systems often store copies of data on multiple systems for live access, and that can multiply costs too — up to $30,000 or even $40,000 per terabyte.
Low-cost tape archival systems exist, but once data is on tape it’s not part of Big Data anymore. It’s just a tape in the closet. Online mainframe and network storage solutions are prohibitively expensive. Only the Hadoop architecture, with the Hadoop Distributed File System and clusters of inexpensive commodity storage components, meets the urgently growing need for inexpensive Big Data storage.
The Future Is Distributed
It’s no wonder the market is flocking to Hadoop-based solutions. The ability to access huge amounts of live data means enterprises can perform new calculations, track new trends, discover new relationships, all of which were inconceivable before — at prices that were inconceivable in the age of the mainframe.