Hadoop Express Train Moves From Crazy Strata NYC to Next Stop in London in 11 days
After 4 days of too little sleep, too much walking and a fire hose of information, Strata is finally over. The conference was definitely crazy from the moment it officially started on Monday afternoon to the closing sessions on Wednesday. Syncsort’s session with Matt Brandwein from Cloudera and our own Jorge Lopez on building out an enterprise data hub that allows you to integrate mainframe data to Hadoop was well attended and we had some excellent follow-up questions – it’s always great when people who know the company from the mainframe days comment how pleased they are to see us being just an innovative with Hadoop.
If your having trouble remembering the best bits of Strata, we did a number of interviews with the major players asking them the key things you need to know from the event. This one includes Tableau, Appfluent, Cloudera and MapR.
As for my personal take aways:
The keynotes were great – the idea of an enterprise data hub and the need for Hadoop to cement it’s role in the enterprise was key. There was lots of discussion about to what degree different products are “open source enough” which is a natural consequence of the amount of different distributions entering the market, but there is a lot of momentum built up around the early entrants – Cloudera, MapR and Hortonworks. There was also lots of focus on real world use cases which was a great evolution from last year.
For me one of the best talks came from the CEO of infochimps who talked about his mother’s struggle with cancer. When asked by his mum what he did he said, “I work with technology that can solve really difficult problems like cancer.” I physically cringed when I heard his next statement. She replied “that’s great son – sounds really simple…when?”. What do you say to someone dying from cancer when you realize that this technology has the ability to fundamentally solve the problem long term – his answer of “soon mum, soon” was gut-wrenching and his observation that the people in the room working together could advance the fight against cancer in a year more than has been achieved in the past 30 was definitely something that made me think hard. They were so many other great sessions.
Here’s my 5 key take aways from Strata 2013
- ETL and ELT replacement – replacing warehouse staging areas (the dirty secret of every warehouse) with Hadoop remains the most profitable and least risky first project
- While moon shot style data projects are no longer impossible thanks to Hadoop, first focus on projects with real business value though projects to help humanity like cancer need to get funded too
- If you’re building a cluster and just intend to throw it out and see what the business will do with it you’re doing it wrong
- Enterprise ready features like security, stability and ease of use for data scientists remain the priority – if your extremely expensive data scientists are going to spend all their time moving and preparing data, you’re wasting their minds and your money – easy, simple, scalable ETL on Hadoop remains critical
- Be wary of people claiming to be big data experts – using dramatic phrases and no substance – I heard a lot of people throw out:
- This is the end of the EDW
- You no longer need ETL
- You can completely replace all your SQL
- My favorite – the Mainframe is dead or with Hadoop you can completely move off the mainframe
Just like with politicians that make false promises, the “big data experts” are not recognizing that Enterprise Data Warehouses EDW, mainframe, ETL are all here to stay- except their role will change significantly and actually Hadoop really means people will do even more ETL – they just seem to call it collect, process and distribute… so it’s not going away, it’s just evolving.
Either way I agree with the underlying theme from way smarter people than me like Mike Olson and Doug Cutting at Cloudera – Hadoop is here to stay and is already becoming the new standard for Big Data architectures, whether you call it the an enterprise data hub or that it’s already part of the data centre. We also need to start preparing for the “Gartner hype cycle” phase of disillusion where we will start to have a better understanding of the true pitfalls/challenges with Hadoop, but given what I was seeing at Strata – and this is the greatest thing about Hadoop – its open source base means that it will make it even stronger adoption given the huge ecosystem already in place to address those challenges – vendors, open source, talent etc.
Overall It is amazing to see at each industry event how quickly Big Data and Hadoop are evolving into a serious enterprise solution.
See you guys in London in 11 days…