How Hadoop is Transforming Telecom

Telecommunications has changed a lot since the breakup of the AT&T monopoly. But one thing hasn’t changed: telecommunications is a capital intensive business with big appetite for data. Now it’s competitive and capital-intensive. Telecom data hunger continues.

Telecom Big is Nothing New

Big Data is not new to telecom. Information about calls, or call-detail records (CDRs), has been collected for monitoring and billing purposes for decades. High speed performance has been a requirement, not a nicety, for telecom. As a result, in-memory databases were first deployed by telecoms in the 80’s.

The scale of these collections became clear when the NSA’s MAINWAY database came to light in 2006. According to estimates made at the time MAINWAY contained 1.9 trillion records. A given carrier probably stored more than this, depending on how much history was maintained. A 2003 listing of large databases included AT&T’s Security Call Analysis and Management Platform (SCAMP). At the time AT&T was keeping two years of call data, which amounted to 26.3 TB, running on AT&T’s Daytona database. A dated report on Daytona, which called it a “data management system, not a database,” indicated that in 2007 Daytona was supporting 2.8 trillion records.

In 2007, it was becoming clear that the 2005 Sprint – Nextel merger was going to be a financial calamity for Sprint. What else has changed in the telco landscape over the past seven years or so?

That was Then. Now Cometh the Petabyte.

As T-Mobile’s IBM project member Christina Twiford said in 2012, it’s that data volumes grow quickly. The Netezza platform she discussed then had to scale from 100TB to 2 petabytes. The operation she described then loaded 17B records daily, and was supported by 150K ETL or ELT processes. Access to this data was provided to 1300 customers through a web interface.

“Hadoop is exciting in that you can through in the structured and the unstructured data for advanced analytics. . . And you can process streams of data without ‘landing’ the data. . . Telcos are all over M2M or the Internet of Things,” she said.

Tim Eckard of Zaloni presented some telecom use cases at a 2013 Telecom Analytics conference in Atlanta. One unnamed wireless operator had a Central Data Mediation Archive used for e-discovery, billing disputes and the like. The size was a total of about 1PB growing at 12TB/day in 2012, and doubled to 25TB/day (5B records) in 2013. According to Eckard, the customer was able to move to a Hadoop environment for ¼ the cost of moving to a traditional (relational?) database setting.

Pervasive described similar results with a smaller telco provider whose Call Descriptor Records were captured with Pervasive’s DataRush. Telco switches dump data every 5 minutes, hundreds of thousands of records per dump, so the telco objective was to increase their speed of processing by an order of magnitude. This was essential in order to improve decision processes, measures of real time profit margin and operational analysis.

IBM’s Robert Segat likewise suggested that telcos would be using analytics for purchasing and churn propensities. Challenges faced by telcos: disruptive competitors, LTE / 4G, increased regulation, support for streamed data – each would be opportunities for Hadoop and Hadoop-like platforms. In particular, wireless data is expected to grow by 50X over the next decade. There are also 30 billion RFID sensors, customer social network information and a greater variety of traffic types – especially video.

What, Me Archive?

According to Hortonworks, its Hadoop distro is being used at seven US carriers. Despite the volume – which BITKOM’s Urbanski indicates in Gigaom is around 10 million messages per second — a telco can save six months of messages for later troubleshooting. This is both an operational and a competitive advantage if it can save churn.

Wireless data is expected to grow by 50X over the next decade

Leading edge solutions include using Hadoop data to allocate bandwidth in real time. This opportunity is appealing to telcos because service level agreements may require dynamic adjustments to account for malware, bandwidth spikes and QoS degradation. Such solutions likely involve all facets of Big Data: volume, velocity and variety of data sources.

Forrester’s Brian Hopkins identified big data for real-time analytics as the number 2 “most revolutionary technology” according to a 2013 survey of enterprise architects.

This optimistic picture clearly omits some of the detailed steps needed to fully deploy a telco Hadoop application. Far from being a greenfield application, most telco Big Data deployments will be bigger and faster with more data sources. But how is that data to be loaded into Hadoop?

Expectations for data pipeline speed are high. ETL toolkits must play nicely with the Hadoop ecosystem. Its proponents in telco, or elsewhere for that matter, have high expectations for developer ease of use and execution-time performance. Convenience, typified by offerings like as Syncsort’s Mainframe Offload allow for Hadoop to coexist with legacy mainframe tools like COBOL copybooks and JCL – without the need for additional software on mainframes. For telcos considering AWS, projects can be rapidly provisioned in EC2 and connect to ETL data sources in RDBMS, mainframe, HDFS,, Redshift or S3.

From CDR to CSR

If your phone rings a few seconds after your next dropped wireless call, it may well be a customer service agent with access to real time analytics to aid in troubleshooting the problem.

Assuming you’ve paid your bill. They’ll know that, too.

Mark Underwood

Authored by Mark Underwood

Syncsort contributor Mark Underwood writes about knowledge engineering, Big Data security and privacy.
  1. Very informative. It’s good to read this. I enjoyed it!

    1. Michael Kornspan
      Michael Kornspan March 19, 2015 at 4:05 pm

      Thanks, great to hear you found it interesting.

  2. sheetul sharma June 23, 2016 at 1:29 am

    very nice article ..really enjoyed it !!!

Leave a Comment