Innovation and what succeeds in the market is an endlessly interesting idea. I was reminded of this recently when I read a New Yorker magazine profile of Clayton Christensen, the business guru most famous for his work “The Innovator’s Dilemma.”  The profile extends beyond his work: it covers his family background, his battle with cancer, his religious faith, and more. In all, it is a fascinating and inspiring profile that I highly recommend.  At the moment, it’s behind a subscription wall, so if you have access you can get it here, or you can read it in the May 14, 2012, print edition.

Christensen’s notion of “disruptive innovation” applies across any industry. An interesting example is perhaps Christensen’s most famous “miss” about the iPhone, which he predicted would not succeed because it was just a fancy cell phone. What he realized later, after its phenomenal success, was that the iPhone was actually disruptive to laptops, not just to other cell phones. A great insight, albeit after the fact.   

All of this got me thinking about changes in the backup world in the past few years, particularly two disruptive technologies, deduplication and snapshots.

Deduplication first made its mark in the form of deduplication appliances, single-purpose devices that were highly disruptive to tape as a backup target.  Disk had long been used for backup, whether as plain disk or in the form of a VTL, but it remained a niche methodology because it was just too expensive. As a result, disk was limited to only a day or two of data retention, if used at all. Deduplication radically changed the economics by providing data reduction rates of 90% or more, which is another way of saying you could get potentially twenty times as much use out of the same amount of disk.  

It changed the face of backup as far as tape was concerned, but interestingly, deduplication was not disruptive to the backup process. Users started replacing tape drives with disk, but everything else stayed the same. In the end, deduplication appliances were disruptive to only a portion of the backup process at the very end of the line. They were evolutionary, not revolutionary.

Snapshots have the potential to be truly revolutionary because they disrupt the entire traditional backup process, changing it from end-to-end, not just at the final step in the chain. But even though snapshots have been around for a long time, they are still not the leading way to protect data, despite all their advantages of speed and performance.  A survey by UBM TechWeb (commissioned by Syncsort) showed only 25% of users made use of primary storage snapshots (you can get the full survey here).

Why the limited uptake? A few key reasons: 

  • Cost: snapshots are typically done on primary disk, which is expensive.
  • Performance: many disk arrays suffer significant performance degradation as snapshots accumulate.
  • Complexity of restore: snapshots are great at capturing data, but a lot of disk systems do not have convenient, easy-to-use workflows for recovering data, do not have a catalog, etc.
  • Limited retention time: because they are expensive, you normally can’t keep weeks or months of data on snapshots.

Maybe this is why snapshots haven’t been as disruptive to traditional backup as might have been expected. So are snapshots destined to remain a limited use option, typically relegated to tier-1 applications and short retention times?

Not at all! There’s a disruptive technology in town now, and it’s called NetApp Syncsort Integrated Backup (NSB).  How does NSB change things?  It is quite simple. NSB takes the snapshots off the primary storage and puts them onto secondary storage, and then overlays it with easy recovery work-flows and a catalog. This seemingly simply change in the design solves all of the key reasons listed above for limited uptake.

I’ve written about this before here if you’re interested in more specifics.

For now, I will conclude with a concept from Clayton Christensen, who refers to the process of consumer product selection as people looking towards a way for “jobs to be done.”  Simply put, people don’t want products, they want to get something accomplished. The IT world is no different. None of us want backup software, really. What we want is for data to be protected and easily recoverable in a way that is cost-effective and reliable, and doesn’t demand too much of our attention. This is exactly what NSB delivers, as we heard recently from a user. It can do the same for you.

{ 0 comments }

When we announced our DMExpress Hadoop offering, we shared a set of results from benchmark testing that had been completed. Testing has continued since, and I wanted to dedicate this post to sharing some of those results.

We did a series of tests that distill down to:

  • TeraSort benchmark (if you’re not familiar with this benchmark, it is worthwhile to search it online)
  • Aggregation based on TPC-H generated data (aggregated on order id for line item data)

We varied two things in the tests:

  • Compression in the shuffle step: no compression, GZIP
  • Data volume: We ramped up to 4TB on the TeraSort and 600GB on the Aggregation

The tests were done on a 10-node cluster running CDH3u2 (Apache 0.20.2).

The results were very interesting, but not surprising.  For TeraSort:

  • No compression:
    • While DMExpress was faster for smaller data volumes (under 1TB), the elapsed times were still small – 15.12 minutes for native sort vs. 11.93 minutes with DMExpress for 500GB
    • When you pump up the data volumes, DMExpress really outperformed the native sort – 240.48 minutes for native sort vs. 144.18 minutes with DMExpress for 4TB.  That’s a 40% improvement and nearly 2x faster.  That was consistent for 1TB and 2TB, as well
    • GZIP compression, the results were consistently 2x or more faster:
      • 20.82 minutes for native vs. 8.98 minutes with DMExpress for 500GB
      • 223.82 minutes vs. 84.72 minutes with DMExpress for 4TB, more than 2x faster!

For the Aggregation, we wrote the same aggregation logic in Java, Pig and DMExpress (a key benefit with DMExpress is using a GUI rather than coding, but this post is focused on performance). The compression results were consistent across the board with the non-compression results, so I will just give you the results using GZIP:

  • 150GB of data
    • Java: 2.4 minutes
    • Pig: 2.92 minutes
    • DMExpress: 1.18 minutes
    • 600GB
      • Java: 7.89 minutes
      • Pig: 11.15 minutes
      • DMExpress:  4.07 minutes

DMExpress is nearly 2x faster vs. Java, and consistently more than 2x faster than Pig.

What’s that mean for you? It means that you can do more with less nodes, which has implications for the CapEx and OpEx associated with it. Simply stated, you can process more data with the cluster you already have available. If you happen to be running on a public cloud, faster processing times also mean less usage time.   

If you have any questions or want to learn more, please feel free to leave a comment.

{ 1 comment }

When it comes to evaluating technology, nothing speaks louder than the voice of the customer. Vendors can say what they want about a product, but what matters is how it works in the day-to-day world of IT, where everything that can go wrong sooner or later does.

Recently, Syncsort and NetApp jointly sponsored a webinar that featured a user of the NetApp Syncsort Integrated Backup (NSB) solution.  Fernando Mejia is the Senior Manager of IT Infrastructure for IPC, which is a Franchisee Purchasing Cooperative for the SUBWAY restaurant stores. IPC helps the 28,000 SUBWAY restaurants in the U.S. and Canada reduce costs by leveraging their collective purchasing power. IPC supplies everything from food to paper goods to IT processes.

Mr. Mejia was kind enough to join us on a webinar that you can view here. A brief registration is required, but it’s well worth it.  And here’s a tip: the first part of the webinar is me talking. If you’re familiar with the NSB story then you can jump to the 25 minute mark where Mejia begins speaking.  It takes a minute or two for the webinar to load up.

I want to give you a sense of what IPC gained by moving to NSB.  Their environment is about 350 servers and 70 TBs of primary storage, most on NetApp FAS 6280 systems. They use NSB to back up that data to a clustered FAS 3160, which is dedicated to backup.  Prior to NSB they were using Symantec NetBackup and having major headaches. Nightly incrementals started at 6:00 p.m. and finished up around 6:00 a.m. Weekend fulls started Friday night at 6:00 p.m. and lasted until Monday morning.

This led to problems, as Mejia said:

“It was always a challenge hoping and praying there weren’t any kind of gotchas, like there always are with backups, that would cause that window to extend. And often it did extend beyond the window and ran into standard business hours. And often times depending on which systems were affected we did have an impact on the performance of our systems, and users had a problem with productivity.”

Not only that, but management was a burden.

“We did have one full-time resource dedicated to just managing backups. That person’s sole purpose was to, in essence, babysit the backup process and make sure we were getting successful backups. It was a very labor intensive process.”

Some new applications that were coming on-line and would significantly extend production hours pushed IPC to look for a solution “to meet our needs, particularly the one need of being able to potentially completely eliminate a backup window.”

They got it with NSB.  The average backup time for their servers is now between 1 and 15 minutes! The backup window is a non-issue. In addition, they gained a significant benefit from the NSB Instant Virtualization capability.  It’s not only useful for recovering systems, but it has dramatically enhanced IPC’s application development efforts.

We have an in-house development staff and we do develop a good majority of the applications we use…  We’re able to leverage Instant Virtualization to bring entire applications, that are composed of multiple servers, from production into our development and staging environment. This minimizes the drift between development staging and production, in turn resulting in much quicker time to develop applications, much more streamlined testing and QA processes, and in the end a lot less issues and problems that make it out into production.”

That’s how NSB leverages the power of snapshots – launch entire applications in minutes using your most recent backup data. And because it’s using NetApp FlexClone to do it, there’s no additional storage required other than new writes to the system.  

Maybe the best part of the new solution, however, was the management relief. Rather than the full-time IT resource required before, with NSB:

“It takes barely an hour a day to go over, manage and maintain the entire solution…  That was a great win, because I can re-assign those resources that were basically just doing maintenance and operational type work and put them into more important tasks and projects that are more crucial to the organization.”

How is this achieved?  Partly it’s how easy the solution is to use.

“NSB’s capabilities of leveraging NetApp technologies as well as being entirely integrated into virtualization technologies allowed us to collapse the amount of administration interfaces into one. That’s one great benefit of the solution. The second great benefit is that it’s extremely intuitive. It’s very easy to get in front of the interface and through a very short learning curve understand how the backup jobs are configured, understand how to perform recovery operations, understand how to generate reports. In the end that resulted in really lowering our operations overhead.”

The other key is reliability. A great deal of backup management ends up being trouble-shooting and scrambling to recover from failed backup jobs. Not with NSB.

“I must say that our job failure rate is extremely low. If I get maybe one or two a week that’s a lot. And usually when we get a job failure it’s a problem with a particular server that was getting backed up. So over time we ended up not having a lot of focus on the backup solution because it just works so well… I remember in our NetBackup days I was hyper-focused on all kinds of detailed information on the backup because it was critical that you were on top of it all the time to make sure it was functioning correctly. With the Syncsort solution that really has changed.”

There’s more I could write, but I’ll leave you to listen to the webinar where you can hear it for yourself. If you have follow-up questions, the webinar will explain how you can get them to me, or just post a comment here.

{ 0 comments }

At the end of March, Syncsort hosted a tweetchat to celebrate World Backup Day.  We were joined very actively by Jon Toigo who made a lot of thought-provoking comments. I blogged a bit on this earlier, here, but wanted to get back to some of Jon’s comments.

During the #backupjam, ESG analyst Jason Buffington launched various questions to solicit comments. For the sake of readability, I’m going to run together some of Jon’s tweets, but otherwise these are his verbatim responses. And then I’ll follow with some short comments of my own. This list is not comprehensive, and if you want to review the original tweets, you can reference Jon’s tweets on March 29, 2012.

Q: What are the top backup challenges facing customers?

@JonToigo: Identifying appropriate bu techniques based on poorly or un-defined restore targets.

My comment:  Jon makes a great point that you have to start with restore in mind. What are you really trying to achieve? From there, you can begin architecting a solution.   

Q:  How has virtualization impacted data protection strategies?

@JonToigo: My bigger concern, server virt vendors claim bu unneccesary. Just HA failover. Not true!  I have been told this over and over and so have my customers. It is wrong-headed.

My comment:  Completely agree! It seems this “all you need is failover” idea springs up every now and again, and it’s never the solution. Failover and HA schemes are there to keep applications running when hardware dies (or somebody pulls a plug). They do nothing to save you when you lose data at the logical level. 

Q: What’s the answer to the broken state of backup?

@JonToigo:  Depends entirely on what’s breaking it.  First, you need to get past the politics. Lose the Tape Sucks Move On bumper stickers for a start.

My comment: Love this response. It’s so Toigo!  First, the obvious fact: you can’t find the answer unless you know what the problem is.  Then the shift into a related issue, what you might call “sloganeering.”  While “tape sucks” might work as pay-attention-to-me style marketing, if you use it as a starting point in your solution design, you may very well be writing off a critical component.  Tape may not be the latest thing, but it remains a key part of many data protection strategies and dismissing it up front is foolish.  

Q: What are some of the main causes for lost data?

@JonToigo: User error. Malware. App error. HW error and Facility faults. In roughly that order.  And blowing through the question “Are you sure?”

My comment: A good list, and important to note that there is very little that hardware redundancy can do to help you with user error, malware or application error. You’ve got to have backup in place to deal with these issues. The final comment speaks to the issue of testing. “Are you sure?” is a simple question, but not at all easy to answer when you’re talking about your backup environment.  

Thanks again to Jon, Jason and everyone else for participating in the #backupjam. We really enjoyed it and look forward to organizing and participating in others like it in the future.

{ 0 comments }