Lessons Learned from Strata London
Paradigm shifts occur in fits and starts. Tablets did not overtake laptops overnight, and laptops — built from many of the same components — have yet to disappear from stores. Perhaps more important is the human element involved in technology transformation. While media attention tends to focus on startups that come and go, or Apple’s latest quarterly earnings report, there is important work being done in the trenches that goes unreported.
Steven Totman (Syncsort) and Matt Brandwein (Cloudera) brought this reality into focus in their joint O’Reilly Strata ’13 presentation in London, “How to Leverage Mainframe Data with Hadoop: Bridging the Gap Between Big Iron and Big Data.”
Watch the full presentation below, then scroll down for additional lessons learned from this year’s conference.
Big Data Suits and Hoodies Unite
As Totman and Brandwein observed, Hadoop can make it possible to bring information closer to insight by connecting mainframe data with external data sources. The results may not always be predictable. Visionaries can be a bit myopic, which may be one reason that budgeting an IT project is more art than agile engineering practices.
For example, in the wake of an enthusiastic CIO’s push toward Big Data and predictive analytics, it will fall to some mainframe technician to work out “the little things” that need to happen to make Big Data a reality. A few of these little details: converting between EBCDIC and ASCII, or unpacking packed decimal data. Another: gaining access to expensive, locked-down mainframe software catalogs to install conversion utilities or middleware — an activity that probably involves someone up the org chart. Hence the idea that “suits” and “hoodies” will need to work together, as first presented by Gartner’s Merv Adian.
Big Machine Data
According to Splunk’s Brett Sheppard speaking at the same conference, “Machine data is the fastest growing, most complex, most valuable area of big data.” Among the experts in attendance at Strata London ’13, Sheppard’s proposition on its face could be argued. But in light of the recent New York commuter train derailment, it is clear that additional machine sensor data — analogous to that being created by Boeing for in-flight airline performance — could lead to faster, more conclusive understanding of causes. Used in connection with decision support systems for train operators, information such as track conditions, brake status, wheel integrity and the position of other trains could be used to improve safety.
To Think Faster, Eat Faster
Specialists are still coming to terms with the implications of real time Big Data. After all, “velocity” and “volume” are two of the oft-cited characteristics that set Big Data apart from, well, whatever came before. While he mentions the well-understood latency associated with pre-processing data and moving it to centralized repositories for analysis, Brian Knox of Talksum goes beyond the obvious trends. Knox is concerned about cross-domain data management, and he discusses the MITRE Common Event Expression standard as one approach.
In fact, this problem has long been studied in the defense and aerospace sector (e.g., “Development of a Knowledge-based System for Multi-Sensor Correlation,” 1983), and such efforts continue to this day, though Big Data may not get a mention. What’s different, as Knox rightly observes, is that a common, machine-ready taxonomy for event representation is now available — albeit only within specific domains. The goal of Knox’s firm, TalkSum, is to develop data streams that support:
“Early establishment and encoding of context and intent provides meaning, which supports the ability to deliver critical information in near real-time to interested systems.”
Mark Underwood writes about knowledge engineering and Big Data.