Credit: Tambako The Jaguar via Flickr

Plunk into Splunk: Machine-Generated Data is Meant to be Ingested

At a presentation earlier this year, an Amazon executive asked a technical audience to speculate as to the largest database that Amazon Web Services (AWS) – the cloud services division of Amazon – was currently hosting. The expert guesses were unsurprising. Consumer credit data? U.S. census data? NASA satellite data?

Nope. Instead the answer was performance and billing data about AWS itself. In fact, AWS internal data was so important, and growing at such a pace, that it was a key factor in Amazon’s planning for infrastructure expansion.

Splunk is a $6B Silicon Valley firm whose founders understood the challenges of machine-generated data. Splunk products are often used in settings where analysts need to study machine-generated logs on a large scale.

Operational Intelligence

Splunk’s most often cited sweet spot is “operational intelligence.” Proponents of this specialization argue that the old model for operation management was to alternate management policies with periodic study. Study intervals could be as long as weeks or months. They argue that Big Data allows potentially real time data to be streamed from operational systems to dashboards staffed by managers. Using such dashboards, policies can be changed immediately in response to data, such as to identify unauthorized mainframe access and security risks, trigger real time alerts for deadlocks and exhausted resources or minimize downtime by identifying critical failures and supporting triage repair and prevention. Operational Intelligence can include elements of complex event processing and business process management and other long-understood information management practices. What’s new is the timeframe for analysis, as well as the type of data being analyzed. Machines are increasingly generating voluminous data; operational intelligence leverages this data. Splunk, its advocates say, is particularly well suited for a new model for analytics.

One Emerging Use Case: Software Defined Networks

The basic design for the internet, and for Ethernet networks generally, hasn’t changed much in the past twenty years. Some innovations are brewing, though. One of these is Software Defined Networking (SDN). While not a new concept – the underlying ideas can be traced back to 1995 or before – SDN was conceived as path toward a more agile, abstractly managed, flexible network.

Many expect that in order for SDN to achieve its aims, Big Data analytics will be needed. A recent Ingram Micro blog post drew an explicit line connecting Big Data to SDN,

Software defined networks eliminate the pain of manual administration by using virtual resources. Control and forwarding are separated and the network is treated as a unified whole so the SDN controller can use the entire network infrastructure to service application workloads as needed.

To manage SDN networks as “a unified whole,” data must be collected from endpoints, server nodes, switches, and a complex mix of legacy and next-gen devices.

As the Ingram author goes on to say, “The intelligence to automate highly distributed networks resides in analytics; assessing data traffic to program responses to traffic behavior, such as optimizing data paths.” To collect network Big Data with sufficient volume and velocity and analyze it in support of dynamically reconfigurable devices may be beyond what current systems can deliver.

Lighting up Storm Clouds

Like many enterprise software suppliers, Splunk offers scaled down “personal” versions of its product. The personal version is not a toy version. Personal users can ask Splunk Free to “index” up to 500MB daily and keep adding to it essentially indefinitely (10TB is mentioned as a possible ceiling).

There’s more. Splunk Storm is also free, and even includes 20GB of storage at the low, low price of free. The Company suggests that potential Splunk Storm uses cut across the areas of operational intelligence, systems troubleshooting and utilization.

If what you need is better operational intelligence, and haven’t figured out what to do with all those chattering, event-recording logs in data centers, the gathering clouds are not all dark. Syncsort’s Ironstream is able to capture mainframe machine data and feed it to Splunk Enterprise, offering an invaluable 360-degree view of an organization’s IT system.

There are few limits to the types of data that can be plunked into Splunk. Read more about using Syncsort with Splunk.

Mark Underwood

Authored by Mark Underwood

Syncsort contributor Mark Underwood writes about knowledge engineering, Big Data security and privacy.

1 comment
  1. […] Logs continue to be one of the fastest growing data sources at organizations today. For example the largest DB hosted at AWS contains machine-generated statistics on AWS itself. […]

Leave a Comment

*