emc

It’s that time of year again, when we look into our rearview mirrors and our crystal balls.  I’ll save my 2012 data protection predictions for a future blog post, although I will note that my “100% Accurate, Guaranteed!” predictions for 2011 all came true!

For now, I want to reflect on the past year of blogging. Rather than picking the topics myself, I’ll let the blog readers do it.  By that, I mean I will now reveal the top five most read data protection blog posts on the Syncsort blog in 2011 as calculated by Google Analytics.

#5 Are Snapshots Backups? Yes Indeed!

I’ve done a number of posts around snapshots, and will continue to do so as I think they are critical to data protection. This post may have gotten a bit of additional eyeballs because at the time I was involved in a bit of back and forth with EMC blogger Mark Twomey, better known as Storagezilla.  Mark continues his opinionated and always-interesting blogging, though I haven’t had any reason to respond to him lately. However, I’m still watching!

 #4 Talking with Netgain at VMworld 2011

Scott Baynes, CTO of Netgain, tells a great story about how his company, a service provider, is gaining serious data protection benefits from the NetApp Syncsort Integrated Backup (NSB) solution.  I really like these customer testimonial videos because the best advocates of NSB are the people that use and rely on it every day.  If you like Scott Baynes’ video, check out this video with Campbell Alliance, also filed at VMworld.

#3 Getting VMware Alignment Right

VMware alignment is a nice technical topic, the kind of thing I’d like to do more of in 2012.  I spent a lot of time in 2011 blogging about the higher-level benefits of NSB.  I’ll continue to do that, of course, but I think it’s time to dig a bit deeper into some topics.  This is also why we recently opened up our Syncsort Community site  to let Syncsort users and partners have detailed technical discussions around our products.  Join us!

#2EMC Replaces 3 Solutions with 4

I have to say, this is a personal favorite!  I had a lot of fun with it, and I hope I drew some compelling comparisons between the EMC data protection portfolio and NSB.  At the time, I made the statement that NSB is beating EMC 80% of the time when customers conduct product evaluations. That was back in May, and we’re still maintaining that win rate.  That said, the folks at EMC are formidable competitors and they continue to work on their products. Of course, we do as well and I’m sure 2012 will find us knocking heads more than once. We might have a surprise or two in store, as well.

#1P2V Migration in 10 Minutes

Perhaps it was the title that drew people in, with its “I can’t believe they can do that” effect. It’s true that NSB does indeed let you do 10 minute P2V migrations along with its data protection capabilities. What I find most interesting is how people continue to struggle with migrating to virtual machines despite the wide array of tools available and the fact that nearly everyone has at least some experience with the process by now. However, there’s always a better way of doing things, especially if you can do them in only ten minutes.

A sincere thanks to all of the loyal readers of our blog so far in 2011. We look forward to sharing our thoughts and exchanging ideas with you in the weeks ahead and as we move into 2012.

{ 0 comments }

Does Dedupe Still Matter?

October 12, 2011

I recently watched a video blog from ESG Senior Analyst Lauren Whitehouse, which she posted from SNW in Orlando.  In the video, Lauren discusses a presentation she made on deduplication and makes several interesting observations that really got me thinking.

First, she was surprised at how many people were in attendance, given that deduplication is yesterday’s news to an extent. Plus nearly everyone in the audience was already using dedupe technology. So why were they there?

“People are curious about what’s next,” Lauren says.  And the discussion has changed. I had to grin when Lauren said, “Just a couple of years ago… the big discussion was about in-line vs. post-process,” because I remember those arguments really well — I was making them myself (with a previous company).

Those were the days when Data Domain was staking out the in-line turf with an argument that was basically “in-line is the only way, the rest of you are stupid.”  Then they got bought by EMC and became a little arrogant according to some.

Now that the dust has settled, most vendors offer a  mix of modes and the real answer to the argument is “it depends,” which is exactly what the smarter observers were saying at the time (I recall Curtis Preston being one of them). 

In any case, the world has moved on and the discussion along with it. Lauren notes that there are so many options now, so many different places to deploy dedupe. It’s not just disk targets anymore. Lauren notes:  “When you look at the number of solutions that are available: hardware, software, primary storage, backup storage, to the cloud, there’s just so many different things that have to be evaluated. It’s confusing.”  

Confusing it is. And that’s one reason we’ve been focused on simplicity and completeness with NetApp Syncsort Integrated Backup (NSB). It may seem a bit of a paradox – don’t you get less complete as you get more simple? Not necessarily. Completeness is about having all the things you really need, while simplicity is about making them easy to use and as transparent as possible. Ideally, you strike a balance between the two.

With NSB, we’ve taken the notion of data reduction and inserted it across the backup process.  Note that I say “data reduction.”  Deduplication is a specific technology approach, while data reduction is the goal. NSB starts the process at the server by using a block level backup method that’s designed to never read and copy the same data twice (we use our own technology with our Agents, and leverage VMware Changed Block Tracking when using Agentless backups).  This gets you the data reduction without the impact of deduplication, which relies on reading all your data, crunching a bunch of hashes and comparing them. And then doing it again the next time you back up.  Dedupe at the server doesn’t make sense. You are stealing resources and creating impact in the last place you want to do that.

Target dedupe is much more viable because you’ve got hardware designed to handle the load.  NSB puts the deduplication where it belongs, at the disk target. Though since NSB creates little duplicate data you don’t get very high dedupe rates. Most of the work is already done. The reason you see 95%-plus dedupe rates from other approaches is that you dump so much duplicate data into your target only to get rid of it. What a waste of effort.  You can learn a lot more about this topic in our Beyond Deduplication white paper if you are interested.

So does dedupe still matter? Of course it does. Data reduction is what makes disk-to-disk backup economically feasible, and deduplication is part of the data reduction process. Not all of it, necessarily, but part of it.

Lauren Whitehouse makes the point that in the last few years data reduction has moved down in terms of IT projected spending priorities, but it’s still solidly in the top ten initiatives. It’s becoming a standard part of IT.  As she notes, “Data growth is not stopping. It’s a continuous pain point for everyone. So I think it’s going to continue to be a high priority just in the face of how do I deal with all this data?” 

We feel the best way to deal with data growth is comprehensively, from the server to the target. Squeeze out efficiencies wherever you can, in a way that minimizes impact and resource consumption.

{ 2 comments }

Last week I got involved in a little back and forth with blogger Mark Twomey , who blogs under the title of Storagezilla. Mark is an EMC employee, but the blog represents his opinions, not EMC’s.

Storagezilla is a good name for him.  He’s a feisty partisan who likes to stomp on cars, especially if those cars say “NetApp” on the side, or the name of other EMC competitors. I really enjoy his blog, but that’s not to say I always agree with his point of view.

Last week in a post called “Snapshots are not Backup,” he put up some comments on NetApp’s recent SnapProtect announcement and the notion of whether snapshots alone are valid as a backup mechanism.  I responded from the point of view of NetApp Syncsort Integrated Backup (NSB), which works differently.  You can read the back and forth, but I wanted to just state a few general points about using snapshots for backup.  (Side note: there is another interesting back-and-forth going on between ‘Zilla and industry guru Curtis Preston, which you can read here.

There are a few common objections that people make when arguing that “Snapshots don’t work as backups.”  Let’s look at a few.

Objection: Because the snapshot depends on the primary storage, if the primary fails the snap fails, so it’s not a backup.

True enough for primary-based snapshots. But NSB does something very different: it uses snapshots on secondary storage without the need to even have primary snapshots. That’s why it can support any primary storage (doesn’t have to be NetApp) and still store backup jobs as NetApp snaps, with all the great speed of recovery advantages snaps give you. Your primary storage can melt into a slag heap and it won’t affect your backup data.

Objection: Backups take up disk space and if you run out of space unexpectedly, your applications will fail. Too risky!

Again, true enough if you are doing snaps on primary, but NSB doesn’t.  If an NSB volume filled up, the worst that could happen is a backup job would fail.

Objection: Snapshot are very large objects and they don’t have good restore granularity.

Could be true in some cases, but with NSB your snapshots are cataloged and searchable, so if you need to find every version of filexyz.doc across dozens or hundreds of snaps, you can do it in seconds. You can use wildcards, file size and date ranges, etc.  Or, if you just want to browse the data, you can mount up a snap in about a minute to whatever server you want (assuming the same OS) and just poke around in the file system. And because everything is on secondary storage, none of this impacts your production in any way.

Objection: Even if you move the snap to secondary disk, that disk could fail or you could lose the site and lose all your backup data.

Well yes. But that’s not an objection so much as a design goal.  Anything can fail. And if you only keep one instance of a backup on one disk array, of course you could lose the backup (or lose all your backups in the event of a site disaster).  That is why NSB includes both disk-to-disk replication as well as tape output. Ideally, you want that third copy of data somewhere else to avoid loss if a site goes down.   

You could present more objections (feel free to drop any in the comments, and I’ll respond to them), but to sum it up, the simple but powerful fact that makes NSB different is that we take the data from any primary storage and store it as snapshots without any dependency on that primary.  As soon as you break that link between the primary storage and the snapshot, you eliminate most of the core objections around using snapshots as backups.

So can you use snapshots as backups?  With NSB you can. If anybody knows of a reason why it’s not perfectly feasible, I’d like to know.

{ 0 comments }

The folks at EMC have been getting a lot of mileage lately over a particular user story where they deployed Networker, Data Domain and Avamar. They’ve promoted it in a press release and have even generated some press coverage.

Now when I read these items, I had multiple “hmmm” moments. Those are moments where I think, “Hmmm… this could have been done a lot better with NSB.”  In fact, EMC’s case study for its solution is, for me, a case study for our ongoing message about how users need to go “beyond deduplication” to really solve their problems.

The user in question was facing a much too common dilemma for those using traditional file backup to tape: backups were taking too long. There is no mention made of the generation of tape technology they were using, but nearly any time you replace tape with disk you’ll see better performance. There can be exceptions, but more often than not money is better invested moving to a disk-to-disk technology than upgrading to faster tape drives.

They also had a truly painful remote site tape problem:  67 remote locations were backed up using tape. I wouldn’t wish that on anybody! 

So EMC suggested moving them to a disk-based solution. So far, so good. But let’s first look at exactly what was replaced. From the press release:

CA BrightStor ARCserve and Symantec NetBackup backup applications were replaced by EMC Avamar® and EMC NetWorker®, and tape libraries and tape drives were replaced by Avamar Data Stores and EMC Data Domain® deduplication storage systems. 

Unless I’m missing something, the solution was to replace two backup software packages and tape with two backup software packages and two new disk platforms.  By my count, that’s going from three disparate solutions to four. But let’s not get stuck only on the numbers.

Seems they started by replacing the remote site backups with Avamar. OK, that makes sense. Anything to get rid of those 67 remote tape drives. It would also seemingly make sense to use Avamar everywhere, but…

[The user] then implemented EMC Data Domain deduplication storage systems with EMC NetWorker backup and recovery software to protect its Microsoft Exchange, Microsoft SQL Server, Oracle and file sharing environments.

Why would you go to different software and hardware here?  Wouldn’t you just prefer one solution everywhere? The answer is because Avamar often fails in larger data environments. One of the real challenges with client-side deduplication is that it can’t hack environments like email and databases, or even large file shares. There is just too much work to do grinding through all the byte-hashing every day. This is also why “backup time” numbers can be disingenuous.  Sure, if you only count the time you spend transmitting bytes the backup might be 20 minutes. But if you count the time hashing data and the time the CPU spikes, it can be hours.  Think of it like warming up your car for two hours preparing for a trip to the grocery store around the corner. The “trip” was five minutes, but you used a lot of gas to get there.  

That’s one of the really nice things about the NSB model.  It never scans the file system. It doesn’t have to read and re-read the same data over and over. Read it once, move it once, and done. The difference in overhead is enormous, and it lets NSB handle big applications with no problems. (We recently helped a university in Texas drop its SQL backup from 48 hours “a day” to about 20 minutes.)

This also points out a huge weakness in the deduplication target model. We learn from the article that the user “backs up about 24 TB a week with EMC NetWorker and Data Domain.”

24 TB a week?  This is a perfect illustration of the problem I like to call “Data Lift.” Because Data Domain doesn’t relieve pressure on the server, you are asking your critical applications to spend the resources needed to read and move 24 TB of data every week. That’s 1.2 petabytes a year of effort, 90% of which is moving the same bits again and again). 

With NSB, you stop the “Data Lift” problem at the source by moving the data once (and once only!).  Target dedupe is designed to solve a problem that backup creates. Well if you’re looking at re-architecting your backup model, doesn’t it make sense to solve ALL your problems at once rather than just band-aid them?

Something else goes unmentioned in these stories.  There is a lot of talk about better backup times, but there isn’t a word about better RPO. That’s because even with the mutliple solutions that EMC brought into the account, backup is still once a day. Nothing in the EMC technology lets you back up more frequently, like you can with NSB.  Again, if you’re tearing out everything for something new, wouldn’t you like a solution that lets you back up, say, every hour? With less impact than you create backing up once a day? And using no more storage?

NSB lets you think big, outside of the dedupe box.

There’s also recovery. Now clearly, there were good strides made in this area. The article sites a case where internal customers needed to do research on what I’m guessing were larger data sets (it’s not specified). Before, the company had an SLA of two weeks: time to fetch the tape, re-index it, dump it to storage somewhere, etc.   Now they get it done in “a number of hours” by moving data to a virtual machine.

An improvement, surely. But how about getting that virtual machine up and running in ten minutes? Or giving access to the data itself with two mouse clicks and about 60 seconds of elapsed time?  NSB can do it, because we leverage the power of NetApp Snapshot technology when we store backups. There is no need to transform data from a backup format to a usable format: it’s always usable, always right at your fingertips.

Finally, NSB is really one, integrated solution as opposed to two hardware platforms and three software modules (and nobody mentioned bare metal recovery, which would require yet another EMC product, whereas with NSB it’s just there, with BMR functionality available from every backup).  NSB will also eliminate the media servers in the environment, cutting out a layer of cost and complexity.

Maybe that’s why when NSB goes head-to-head with EMC in customer evaluations, we’re winning 80% of the time. Even in EMC shops, despite all the advantages that the incumbent has.

There really is a better way.

{ 0 comments }