October 2011

The other day I came across an interesting article in the Channel Register, a publication I read frequently. The journalist, Chris Mellor, was speculating with a backup vendor about a kind of dreamland of data protection. For some reason, it made me think of that moment in “Napoleon Dynamite” when Pedro is running for president, and his entire campaign speech is “If you vote for me, all of your wildest dreams will come true.”  

That may work as a campaign speech, but I wouldn’t want to be on the hook to deliver on that kind of promise! But what Mellor was talking about was a bit more specific. Can his backup dream come true?

The discussion centered around one of my favorite topics, using snapshots as backups. However, it was specifically focused on leveraging hardware snapshots to get the job done. The problem is that hardware snapshots are vendor specific. The article notes:

Snapshots, performed by storage array controllers and thus imposing no burden on server resources, were specific to the storage array supplier. You can’t recover a NetApp snapshot to a VNX or CLARiiON array.

Yes, good point. Sure would be nice if you could.  The article went on:

Hardware snapshots are supplier and product line specific. If they could be made multi-vendor than a hardware snapshot-based backup product could abolish server-based backup windows because a server-based application wouldn’t be running backup code. Nor would a media server. The backup code, if it existed, would merely issue commands to the storage array,

This is another interesting idea. Get rid of media servers and leverage snapshot code on the array? I like this idea. At the end, the article reaches its Pedro moment:

The idea of using storage array hardware snapshots, with possibly continuous or near-continuous data protection and, by doing so, drastically shortening or eliminating backup windows, is in tune with this trend, and it would surely have a huge appeal. Will it happen?

Will all your wildest backup dreams come true? Well, I was all set to take this to our certified smart guys in engineering and say, “this is a great idea! Let’s do it!”  But then I remembered…. Syncsort has been doing this in conjunction with NetApp for the past five years!

The only difference is that with NetApp Syncsort Integrated Backup (NSB) we don’t leverage snapshot technology on the primary storage. Instead, we consolidate backups onto NetApp secondary storage in order to leverage snapshots and clones for rapid recovery of data. This means you can use it across any primary environment, even for data that’s not SAN based (such as local C: drives).

There is one small difference from meeting the “wildest dreams” that Mellor writes about. We do run agents on servers (though not always), but there are good reasons for that. It’s still the best way to capture application-specific knowledge about the data (try backing up a SharePoint farm without agents and all you’re really doing is backing up a bunch of Windows servers). And since the NSB agents are so low impact, they eliminate the real problem with agents, which is taking up too many server resources. We don’t do that. And in any case, for VMware environments you can run without agents (the notion of “agentless” can be misleading – something is always happening somewhere. More on that in a future blog post).

But despite minor variations, NSB is basically doing exactly what Mellor contemplates: delivering a unified backup method that works across any primary disk environment, yet is able to leverage the speed and efficiency of snapshot technology. Not only that, but NSB uses the best snapshot technology in the known universe, which belongs to NetApp.

Mellor speculates that such a product “would surely have a huge appeal.”  We agree!  And we’re finding more and more that customers agree with us too.  As Napoleon Dynamite would say…“Gosh!”

{ 0 comments }

It’s not news anymore to mention that Blackberry services were down last week, for as much as four days.  I’m a Blackberry user myself and have very little good to say about the usability of my Storm 2, but this post isn’t about phones. And it’s not to dwell on the causes of the outage, but rather to take some data protection lessons from it.

The big, obvious lesson to me is that recovery time matters.  If Blackberry services were down for an hour, it would have been annoying but it would blow over. If they were down for a day, it would be bad but maybe not critical. But being down for four days is a lifetime in today’s world of non-stop data availability. 

The consequences for Blackberry maker Research in Motion?  Financial damages may be “limited” to $350 million, but the damage to reputation may be far more significant.

Blackberry has definite advantages in the corporate market because they still have the best security infrastructure by far, but why would anyone in the consumer space go with Blackberry at this point?  I was surprised by my own experience last week.  I hadn’t noticed anything at first because I was in my office and therefore not looking at my phone. Then we got an email from our IT department saying there were problems, so that made me look. Sure enough, no work emails showing up. But it was worse. I have two personal accounts coming into the phone, and they were down too. So despite my email being distributed to three different providers, the Blackberry network was the single point of failure for all of them.  I’m glad I wasn’t traveling!

Which brings up an interesting thing about Blackberry: it’s a true cloud service though you don’t see it referred to that way often. And as such, it has to make you think about the whole movement to the cloud. Whether building your own cloud or looking to outside providers for cloud-services, it’s critical that you fully understand how resilient the system is and how quickly data can be recovered and services restored in the (inevitable) event that something goes wrong.

This is why we focus again and again on recovery with NetApp Syncsort Integrated Backup. You can’t build a foolproof system, so eventually you will need to recover. One of the ways we do that is through our Instant VirtualizationTM feature. Instant Virtualization is like an App for getting your data and systems back online rapidly. I refer to it as an App because it’s built in, easy to use and requires minimal effort on the part of the user. No special backups are required. Any backup from any server – physical or virtual – can be converted into a new VMware virtual machine in less than ten minutes.   

Think about that. Lose your email server: get it back in ten minutes. Lose a key customer-facing web server: get it back in ten minutes. Lose a SharePoint server hosting a major internal project: get it back in ten minutes. Lose a database server in the middle of quarter-end report runs: get it back in ten minutes.

Time is money. Time is reputation. Time is the difference between your customers saying “well things happen sometimes” rather than “I’m never going back again.”

We all like to make jokes about how distracted we are by our mobile devices. Somebody even tweeted the following:

Thanks, #dearblackberry, I had my dinner companion’s undivided attention for the first time in 6 years.

But the truth is we rely on services being available whenever we want them, and our customers insist on it. You can’t afford to be in a situation where you can’t recover rapidly and reliably.

So is there an App for recovering from major server failures?  Yes, and it’s called NetApp Syncsort Integrated Backup.

{ 1 comment }

I am currently in Orlando for the first of two events happening here over the next couple weeks.  First is the Gartner Symposium / ITxpo, and the second is TDWI which kicks off later this month.  As hard as it is to believe, this is my first time to a Gartner Symposium, and I am very much enjoying the sessions, the insight, and meeting with colleagues, partners, and the Gartner analysts.

I am concentrating specifically on the data management and data integration sessions, but I am also very interested in the sessions on cloud, big data, and Hadoop.  I will also be spending time at the ITxpo aspect where vendors – Syncsort included – toot their own horn and try to impress the masses.

The data management and data integration sessions from analysts like Ted Friedman and Mark Beyer are particularly interesting to me. One of the benefits of my role at Syncsort is having the opportunity to regularly interact with smart analysts like Ted and Mark. It is impossible to participate in a briefing or advisory session with them and not learn something and/or engage in a spirited debate.  Specific sessions that have caught my eye at Symposium include:

I am interested in the trends that Gartner is seeing in the industry, interesting vendors and technology that have caught their attention, and most importantly what customers are sharing with them. It is always interesting to see how that last piece with customers aligns to what we are hearing from our customer base. For example, we are hearing more and more from prospects that they need to bring their transformations (the “T” in ETL) back to the ETL layer or engine. They’re experiencing performance and capacity issues with their current implementations.  More on this in the coming weeks, but in the meantime check out this video we recently put out on this issue.

Next is cloud.  I spend a lot of time talking to customers and thinking about the possibilities with the cloud from a data integration perspective. As part of this, I am constantly looking at the latest and greatest market projections and analysis.  I’m sure I will get my fill of this topic at Symposium and will share any key takeaways in this space in upcoming posts.

It would also be impossible to ignore anything Big Data and Hadoop related.  Everyone knows data is growing and there is a need to extract and process more data.  While many companies, including Syncsort, like to talk about social media and mobile, we are also seeing businesses wanting to handle more granular data and more historical data.  They don’t just want a month’s worth of a customer’s shopping habits or vendor purchases. Their marketing departments want 3 years worth of historical information…and they want it refreshed daily or even multiple times a day. Easy enough, right?

Enter Hadoop…ok, Hadoop entered several years ago.  But we are seeing more and more interest in what Syncsort can do for Hadoop.  At the conference, it’s going to be interesting to see what Gartner is saying and projecting for Hadoop usage, use cases, and the overall maturity of the Hadoop market and deployments.

Syncsort is also here with a strategic partner of ours, Clerity Solutions.  We are in booth 529 showcasing how we can work with leading re-hosting partners like Clerity (Micro Focus and Oracle too) to accelerate mainframe modernization and migration projects.  Stay tuned on Tuesday for a product announcement and we’ll be sharing thoughts throughout the week on Twitter, as well, using hashtag #GartnerSYM.

{ 1 comment }

Does Dedupe Still Matter?

October 12, 2011

I recently watched a video blog from ESG Senior Analyst Lauren Whitehouse, which she posted from SNW in Orlando.  In the video, Lauren discusses a presentation she made on deduplication and makes several interesting observations that really got me thinking.

First, she was surprised at how many people were in attendance, given that deduplication is yesterday’s news to an extent. Plus nearly everyone in the audience was already using dedupe technology. So why were they there?

“People are curious about what’s next,” Lauren says.  And the discussion has changed. I had to grin when Lauren said, “Just a couple of years ago… the big discussion was about in-line vs. post-process,” because I remember those arguments really well — I was making them myself (with a previous company).

Those were the days when Data Domain was staking out the in-line turf with an argument that was basically “in-line is the only way, the rest of you are stupid.”  Then they got bought by EMC and became a little arrogant according to some.

Now that the dust has settled, most vendors offer a  mix of modes and the real answer to the argument is “it depends,” which is exactly what the smarter observers were saying at the time (I recall Curtis Preston being one of them). 

In any case, the world has moved on and the discussion along with it. Lauren notes that there are so many options now, so many different places to deploy dedupe. It’s not just disk targets anymore. Lauren notes:  “When you look at the number of solutions that are available: hardware, software, primary storage, backup storage, to the cloud, there’s just so many different things that have to be evaluated. It’s confusing.”  

Confusing it is. And that’s one reason we’ve been focused on simplicity and completeness with NetApp Syncsort Integrated Backup (NSB). It may seem a bit of a paradox – don’t you get less complete as you get more simple? Not necessarily. Completeness is about having all the things you really need, while simplicity is about making them easy to use and as transparent as possible. Ideally, you strike a balance between the two.

With NSB, we’ve taken the notion of data reduction and inserted it across the backup process.  Note that I say “data reduction.”  Deduplication is a specific technology approach, while data reduction is the goal. NSB starts the process at the server by using a block level backup method that’s designed to never read and copy the same data twice (we use our own technology with our Agents, and leverage VMware Changed Block Tracking when using Agentless backups).  This gets you the data reduction without the impact of deduplication, which relies on reading all your data, crunching a bunch of hashes and comparing them. And then doing it again the next time you back up.  Dedupe at the server doesn’t make sense. You are stealing resources and creating impact in the last place you want to do that.

Target dedupe is much more viable because you’ve got hardware designed to handle the load.  NSB puts the deduplication where it belongs, at the disk target. Though since NSB creates little duplicate data you don’t get very high dedupe rates. Most of the work is already done. The reason you see 95%-plus dedupe rates from other approaches is that you dump so much duplicate data into your target only to get rid of it. What a waste of effort.  You can learn a lot more about this topic in our Beyond Deduplication white paper if you are interested.

So does dedupe still matter? Of course it does. Data reduction is what makes disk-to-disk backup economically feasible, and deduplication is part of the data reduction process. Not all of it, necessarily, but part of it.

Lauren Whitehouse makes the point that in the last few years data reduction has moved down in terms of IT projected spending priorities, but it’s still solidly in the top ten initiatives. It’s becoming a standard part of IT.  As she notes, “Data growth is not stopping. It’s a continuous pain point for everyone. So I think it’s going to continue to be a high priority just in the face of how do I deal with all this data?” 

We feel the best way to deal with data growth is comprehensively, from the server to the target. Squeeze out efficiencies wherever you can, in a way that minimizes impact and resource consumption.

{ 2 comments }