A Quick but Fun History of Big Data (P.S. It’s Older Than You Think!)
‘Big Data’ has been a buzzword for several years now. But that term is just a new word to describe a phenomenon that is as old as humankind. Ancient people used to collect and store data on things like commodities and finance. What is new isn’t the act of gathering and analyzing data, it is the sheer quantity of data that can now be stored, processed, and shared in real or very nearly real time. Here is a short history of how big data came to be what it is today — a worldwide phenomenon that stands to change everything from how we buy and sell to how we stay healthy and grow our food.
Data Journalism’s Birth: 1848
Though data journalism is attributed only to the modern publications with access to enormous volumes of data all over the world, their ancestor was a little-remembered odd fellow by the name of Horace Greeley.
When a young congressman by the name of Abraham Lincoln (ironically remembered for his amazing dedication to the truth) was serving the great state of Illinois, a journalist by the name of Horace Greeley started a paper called the New York Tribune. Greeley was quite the character — vocally supportive of the institution of slavery, while an unapologetic Marxist and an untimely vegetarian. Greeley was also the predecessor of today’s data journalists. He published verifiable data that Congressman Lincoln made off with some $677 of taxpayer money (today that would be about $18,700). In the end, the revelation did nothing to stifle Lincoln’s progress to the White House, but Greeley did lay important groundwork for today’s data journalists.
The Early Years: 1944
A librarian named Fremont Rider published a book called The Scholar and the Future of the Research Library. In this work, Rider guesstimates that college libraries in America were doubling in size every 16 years. At this rate, he estimates that by 2040, the Yale Library would contain about 200,000,000 volumes, which would amount to 6,000 miles of bookshelves, and would require a staff of more than 6,000 librarians. Actually, his estimates weren’t so terribly off.
The Dawn of the Computer Era: 1961
Derek Prince was the next person to make a significant prediction about the future explosion of data. In Science Since Babylon, Prince wrote that the sum total of scientific knowledge has grown exponentially instead of linearly, doubling in volume every 15 years and growing by a factor of ten every 50 years. He made this estimate based on the growth in the numbers of scientific journals and papers that had been published up until his time. He called the trend the Law of Exponential Increase, which holds that every advancement in scientific knowledge produces a whole new series of other advancements at a rate that is reasonably constant and therefore, can easily be predicted. Again, Prince wasn’t far off the mark at all.
The Mainframe was Born: 1962
Clearly, the idea of big data was there. What was lacking was a convenient way to store it and a powerful way to process it. IBM gave life to the mainframe in 1962, and now the stage was set for data to make a real impact on the world.
Data Storage Becomes a Thing: 1967
This year, B. A. Marron and P. A. D. de Maine published the article “Automatic Data Compression” in the journal Communications of the ACM. The article said, “The ‘information explosion’ noted in recent years makes it essential that storage requirements for all information be kept to a minimum.” The article goes on to elaborate on how that could technically be done, “a fully automatic and rapid three-part compressor which can be used with ‘any’ body of information to greatly reduce slow external storage requirements and to increase the rate of information transmission through a computer.” Clearly, the predecessors were better informed than one would believe before modern advancements in data storage.
Sort Technology Allows for Better Data Management: 1969
It is unlikely that the sleek systems we use to collect and analyze data today would recognize their great-great grandfathers like the IBM 700/7000.
By 1969, the technology to leverage big data was available, though in a much more primitive form than it is today. When Syncsort introduced their first sort product, everything was in place to take advantage of the mainframe’s capacity to usher in the era of big data.