Big Data is Like Kissing in the School Yard
Lots of people talk about it.
Very few people are actually doing it.
Even fewer are doing it well.
On Monday night, well over a hundred data geeks met in downtown Manhattan to hear about comScore and how they process 1.7 trillion digital interactions using Hadoop – one of those companies that’s not just talking about Big Data, but doing it ─ and extremely well at that.
Joe Caserta, famous for co-authoring the Data Warehousing Toolkit, opened the session talking about a real-time data application that his company Caserta Concepts recently delivered on Hadoop using flume. He admitted that he thought the project was pretty Big Data until he’d met Mike Brown, the comScore CTO. Mike was the first engineer at comScore, and explained how since then, using Hadoop, they have managed to grow to over 1,000 employees and over 2,100 customers. They are now a global internet technology company providing analytics for a digital world. Some interesting stats – they have over 500 billion events (digital interactions) per month, with a daily aggregate of 1.5 billion, 130 billion aggregate records for 92 days, monitoring over 70,000 campaigns across more than 50 countries, creating 15 billion distinct cookies in a month.
Founded in 1999, comScore is best known as the gold standard for measuring digital activity, including website visitation, search, video, social and digital advertising. comScore’s data and technologies are well-established crucial components in measuring and analyzing the rapidly evolving digital world, and are widely deployed at a broad range of publishers, advertising agencies, advertisers, retailers and telecom operators, both in the US and internationally
comScore combines Global Person Measurement using Panel Data with Global Device Data from census data to create the patent pending Unified Digital Measurement process now adopted by 90% of the Top 100 US Media Companies. Their data is constantly growing – the most recent peak was over 1.7 Trillion events in August 2013. I’ve known Mike for some time and comScore presentations are always huge draws at the various Hadoop conferences, so it’s now got to a point that any time I’m presenting on comScore’s usage of Syncsort, I ping Mike to see what the new high water mark on their data is. We joked before the presentation about how whenever Mike and I talk about comScore and Syncsort, we constantly have to apologise that slides we were forced to submit early by the conference police are always inaccurate as volumes have gone up substantially since the deadline.
You can read the gory details, but comScore has an overwhelming majority of machines worldwide included in its UDM Measurement ─ e.g. 91% in the US, 92% in the UK – the only place they don’t have total domination is North Korea and to be honest they are not too worried given it’s unlikely there is much advertising going on there.