Back in December 2013, a U.S. District Judge, Richard J. Leon ruled in a lawsuit brought by a legal activist that large scale data collection by the National Security Agency was, as the Washington Post put it at the time, “almost certainly unconstitutional.” At issue was what was described at the time as “billions of records” of phone metadata that the NSA had been collecting. The ruling came on the heels of Edward Snowden’s leaks of information the NSA had classified earlier in 2013 on behalf of various NSA surveillance programs.
These events, in some circles, unfairly smeared the concept of metadata with public concerns over the prospect of government intrusion into their private lives. Indeed, in that 2013 case, Judge Leon argued that “no court has ever recognized a special need sufficient to justify continuous, daily searches of virtually every American citizen without any particularized suspicion.”
What’s a data warehouse custodian — for whom metadata is key for governance and in some cases even compliance — to do? Just as “Video Killed the Radio Star,” can privacy concerns kill metadata, the data warehouse star?
The Metadata Star
Over the past several decades, the business data warehouse has steadily gained momentum, and metadata is seen by most decision support analysts as an indispensable component of all data repositories. Metadata, sometimes referred to as “data about data,” was first formulated in 1968 by Philip Bagley, and is now a fully integrated component in most data warehouse, legacy and next generation data repository implementations.
Some, like Reuben Vandeventer of Data Clairvoyance believe that firms like Google have “built their entire business models on the management and publication of metadata.” Compliance with regulations like Basel II, PCI and SOX are guided by the metadata star. Because nearly all of the world’s largest firms subject to regulation are also mainframe users, z/OS metadata matters to them.
Metadata has worked its way into numerous products and services.
Mainframe and Metadata
Many parts of the z/OS ecosystem depend on a free flow of metadata.
For example, IBM’s own products are rich with metadata. IBM’s Data Server Manager (DSM) collects metadata from SQL queries. The IBM Cognos Framework Manager is referred to in an IBM Guideline document as “a metadata modeling tool that drives query generation for IBM Cognos Business Intelligence.” Access to metadata from VSAM or sequential file maps are part of IBM’s z/OS platform for Apache Spark.
On the third party front, Software AG, in an Adabas product brief, argues that long term integrity of metadata must accompany z/OS data archiving processes in order to manage the full data life cycle. The Oracle Tuxedo Metadata Repository enables Tuxedo-managed processes in z/OS to reliably managed brokered events across domains. Oracle Metadata Management is a requisite module in the Oracle Business Intelligence product. The CA Test Data Manager maintains a metadata repository to allow the product to produce test data for z/OS. The SAS Fraud Management tool running on z/OS requires the company’s Metadata Server. Business intelligence products from MicroStrategy support metadata repositories hosted on z/OS DB2.
Metadata in the Hadoop Era
Fast forward the digital equivalent of eons, and partnerships like the Cloudera – Syncsort alliance illustrate how pervasively the metadata concept has inserted itself into current metadata management practices with mainframe data on and off platform. Apache Hadoop is turning 10 this year. It’s been a fast and exciting decade, in which a highly scalable open source technology and its offspring have transformed expectations for data volume, velocity, and variety.
The Hadoop wave has washed over and transformed mainframe technology. Data delivered by Syncsort DMX-h enables Cloudera Enterprise Data Hub (EDH), Cloudera Manager, Apache Sentry, and Cloudera Navigator to expand enterprise metadata repositories. The result writes Tendü Yoğurtçu, is “a unified metadata repository” and a more complete view of “data lineage.”
With appropriate tools in place, even legacy metadata from COBOL copybook and VSAM file metadata can be captured. A 2014 Syncsort contribution to Apache Sqoop made this possible.
The payoff for metadata integration is most apparent where governance and compliance drive IT investments. This track for exploiting metadata will continue to be active.
For some far-sighted enterprises, well-structured metadata repositories will power deep learning and other forms of artificial intelligence.
Metadata is likely to be part of new cybersecurity solutions. At Medical Mutual of Ohio, crucial metadata is extracted from z/OS mainframe logs using Ironstream and deposited into Splunk Enterprise to enhance cybersecurity intelligence. This and other Big Iron to Big Data solutions are the “future is now” of enterprise security and operational analytics.
Done with sufficient ingenuity and transparency, metadata could help protect privacy – perhaps helping to restore public confidence in enterprise private citizen data holdings. Maybe even in government systems that were the target of the Snowden disclosures.