May 2011

If you aren’t using it already, you probably have Microsoft SharePoint somewhere in your future. Back in 2008, Bill Gates somewhat famously noted that SharePoint was the fastest growing application in Microsoft history.  The momentum doesn’t seem to have slowed much in the years since, and numerous companies are adopting SharePoint to enhance collaboration and data sharing both internally and externally with partners or customers.  

But while SharePoint can help your users work better, it presents a unique headache from a data protection perspective.  To keep it simple, SharePoint backup troubles come in two main flavors: too much data growth, and a complex application environment.

Regarding data growth, SharePoint is becoming famous for “SharePoint sprawl” and projects running way over anticipated amounts of storage space.  I was a rather early user of SharePoint as a Microsoft employee back in 2007, and one of the things that we saw in spades there was SharePoint sites growing like weeds. As soon as, say, the engineering group in my department got an overall site, another dozen people sprang up insisting they could use sub-sites. These then spawned further sub-sites.  And this happened across all the different groups: engineering, test, support, marketing, etc.

One aspect of this was a proliferation of duplicate data. In my role, I would have some access to engineering sites that others in marketing didn’t, so if I got something interesting from an engineering site, naturally I posted it on the marketing site so my team could get access. This happened, I don’t know, about a thousand times a week! It seemed like it anyway. I wasn’t in a very large group by Microsoft standards (100+ people), but boy did our SharePoint get stuffed with documents quickly.

Traditional, file-based methods of backup – as we all know by now – have long been struggling to keep up with data growth, and SharePoint is no exception.  Even if your backup works in the early stages of your SharePoint deployment, you mind find yourself running into problems sooner than you anticipate.

The other issue, and really the more important one, is the complexity of the SharePoint application environment. Even the simplest single server SharePoint environment combines a SQL database backend with multiple “sites,” search databases, and the need for individual item recovery (since SharePoint is still very much a big file server).

But almost any environment of any size will not be a single server but a SharePoint “farm.” And that’s where it gets hairy.  Now your SharePoint application is split across multiple server roles: front-end web servers, application servers and sites, database servers. If your backup application doesn’t understand the farm in its entirety you may have big problems protecting what you need to protect and recovering what you need to recover.   

The biggest validation of this challenge is the proliferation of SharePoint specific backup products on the market. Indeed, an entire industry exists of products designed to backup SharePoint better than the legacy backup products handle it. In fact, a well known data protection industry analyst said to me that legacy product coverage of SharePoint was “pretty pathetic” (this was a conversational remark, so I’m not mentioning names).  

The trouble with point products is that they represent extra cost, a new learning curve, and they remain forever outside the rest of your backup environment. Do you really want one backup solution for everything except SharePoint? (This is assuming you are already lucky enough to have figured out how to do everything else with one solution!)

We think NetApp Syncsort Integrated Backup has some exceptional SharePoint features in terms of both backup and recovery, especially for a general purpose solution that you can also use for the rest of your data protection.

Rather than elaborate on those features here, I will invite you to a webinar I am doing later today (Thursday, May 26th at 1:00 p.m. EST). I’ll be describing our SharePoint solution and also providing a demo so you can see it in action. You can sign up for the webinar here. Even if you register and can’t make the live presentation, we’ll send you a link for the recorded version as soon as it’s available.

{ 0 comments }

I’ve been spending the week in Frankfurt, Germany, participating in a sales and systems engineer training for the Syncsort EMEA team.  We’re ramping up NSB quickly in the EMEA market now that we’ve announced distribution partnerships with Avnet and ALTIMATE.  Other partnerships will be signed too, so stay tuned.

As with NSB in the North American market, the technology has been around for many years and we have many customers in the region using the combination of Syncsort software and NetApp storage.  The NSB solution branding makes it easier for partners to deliver the offering and for customers to purchase it, and behind the scenes a joint NetApp/Syncsort support agreement ensures smooth sailing for support issues.

Meanwhile, members of the Syncsort team enjoyed a nice beer garden dinner Wednesday night, complete with lots of fresh asparagus, which is currently in season.  Mixed in with all the other discussion was a vigorous debate over the use of the word “leverage,” which our resident UK language expert and professional services guru Phil Kilburn insisted should not be used as a substitute for “use.”  I admit to being guilty of this all the time, as in sentences such as “NSB leverages NetApp snapshot technology for rapid recovery of data.” 

Phil’s correct about the usage, but “leverage” has become commonly accepted business jargon, like everything from “action item” to “zero sum,” words we use every day without thinking about how silly they might be.

Well, here in EMEA, our NSB partner community will now be able to – pardon me, Phil! – leverage the highly value added services provided by Avnet and ALTIMATE to further their data protection lines of business and bring best-in-class data protection and recovery to their customers.  And that’s not a zero-sum situation for anyone!

{ 0 comments }

Today, Syncsort announced our strategy and entry into the Hadoop community.  This is really exciting, as our customers have told us they are pushing more and more into Hadoop as “big data” grows in their enterprise and the need to scale becomes even more critical for their businesses.  Our developers are really excited about what Syncsort is doing, as well.  Even though we are an East Coast based company, several of them are even threatening to dye their hair purple…Hadoop purple!

Our announcement has two major components to it.  The first part is that we intend to contribute an external sort “plug-in” to the community.  There have been calls in the past for performance enhancements and other optimizations to sort.  With this contribution, anyone could seamlessly plug their own sort engine into Hadoop by using the published interfaces, including Syncsort’s solution (more details on that below).  With Syncsort’s 40+ years of experience in sorting, we believe we have unique expertise we can apply for the benefit of the larger Hadoop community.

While other data integration vendors are talking about Hadoop, we have not seen any of them embrace the community by making contributions. We believe this distinguishes Syncsort’s entry into the community and hope that it is viewed as a sign of our sincerity and excitement around working with the open source community and customers to truly make Hadoop better and even more valuable than it is today.

 The second part of our announcement is the new DMExpress Hadoop Edition.  Entering a limited availability beta period in June, this new offering will encompass 3 components: 

  1. HDFS connectivity: extract and load HDFS.  We actually can do this today with examples we ship in the product.  If you’re a DMExpress customer, check this out in the online help.
  2. The sort acceleration piece from our contribution (discussed above) to actually improve the sort performance.  Our marketing team (who I think is also dabbling in the purple hair thing) is calling this Hadoop Acceleration.  While we are contributing the plug-in, the actual sort from Syncsort will be this new DMExpress Hadoop Edition.  As you can see from our announcement, we have seen some pretty good performance improvements.  We will continue to benchmark our acceleration throughout the beta period.  Stay tuned to this blog for more results.
  3. The ability to create MapReduce jobs in the DMExpress graphical environment, rather than write Java, Pig scripts, etc.  If you know DMExpress, this is the Task Editor.  If you need to write data transformation, re-formatting of data, aggregations, etc., the user can now use our Task Editor.  DMExpress will automatically deploy on the Hadoop cluster sourcing the HDFS, and running the transformations across the cluster.  Not only is the processing faster, the jobs are much easier to write and maintain.

This is obviously just the beginning.  I am very excited about our announcement today and our entry into the Hadoop community.  We have received overwhelmingly positive feedback from our customers and the industry analysts we have briefed.  Stay tuned for more details and results from our beta testing.  I even promise to post pictures of any Syncsort developers or marketing folks that actually follow through with the purple hair!

{ 0 comments }

The folks at EMC have been getting a lot of mileage lately over a particular user story where they deployed Networker, Data Domain and Avamar. They’ve promoted it in a press release and have even generated some press coverage.

Now when I read these items, I had multiple “hmmm” moments. Those are moments where I think, “Hmmm… this could have been done a lot better with NSB.”  In fact, EMC’s case study for its solution is, for me, a case study for our ongoing message about how users need to go “beyond deduplication” to really solve their problems.

The user in question was facing a much too common dilemma for those using traditional file backup to tape: backups were taking too long. There is no mention made of the generation of tape technology they were using, but nearly any time you replace tape with disk you’ll see better performance. There can be exceptions, but more often than not money is better invested moving to a disk-to-disk technology than upgrading to faster tape drives.

They also had a truly painful remote site tape problem:  67 remote locations were backed up using tape. I wouldn’t wish that on anybody! 

So EMC suggested moving them to a disk-based solution. So far, so good. But let’s first look at exactly what was replaced. From the press release:

CA BrightStor ARCserve and Symantec NetBackup backup applications were replaced by EMC Avamar® and EMC NetWorker®, and tape libraries and tape drives were replaced by Avamar Data Stores and EMC Data Domain® deduplication storage systems. 

Unless I’m missing something, the solution was to replace two backup software packages and tape with two backup software packages and two new disk platforms.  By my count, that’s going from three disparate solutions to four. But let’s not get stuck only on the numbers.

Seems they started by replacing the remote site backups with Avamar. OK, that makes sense. Anything to get rid of those 67 remote tape drives. It would also seemingly make sense to use Avamar everywhere, but…

[The user] then implemented EMC Data Domain deduplication storage systems with EMC NetWorker backup and recovery software to protect its Microsoft Exchange, Microsoft SQL Server, Oracle and file sharing environments.

Why would you go to different software and hardware here?  Wouldn’t you just prefer one solution everywhere? The answer is because Avamar often fails in larger data environments. One of the real challenges with client-side deduplication is that it can’t hack environments like email and databases, or even large file shares. There is just too much work to do grinding through all the byte-hashing every day. This is also why “backup time” numbers can be disingenuous.  Sure, if you only count the time you spend transmitting bytes the backup might be 20 minutes. But if you count the time hashing data and the time the CPU spikes, it can be hours.  Think of it like warming up your car for two hours preparing for a trip to the grocery store around the corner. The “trip” was five minutes, but you used a lot of gas to get there.  

That’s one of the really nice things about the NSB model.  It never scans the file system. It doesn’t have to read and re-read the same data over and over. Read it once, move it once, and done. The difference in overhead is enormous, and it lets NSB handle big applications with no problems. (We recently helped a university in Texas drop its SQL backup from 48 hours “a day” to about 20 minutes.)

This also points out a huge weakness in the deduplication target model. We learn from the article that the user “backs up about 24 TB a week with EMC NetWorker and Data Domain.”

24 TB a week?  This is a perfect illustration of the problem I like to call “Data Lift.” Because Data Domain doesn’t relieve pressure on the server, you are asking your critical applications to spend the resources needed to read and move 24 TB of data every week. That’s 1.2 petabytes a year of effort, 90% of which is moving the same bits again and again). 

With NSB, you stop the “Data Lift” problem at the source by moving the data once (and once only!).  Target dedupe is designed to solve a problem that backup creates. Well if you’re looking at re-architecting your backup model, doesn’t it make sense to solve ALL your problems at once rather than just band-aid them?

Something else goes unmentioned in these stories.  There is a lot of talk about better backup times, but there isn’t a word about better RPO. That’s because even with the mutliple solutions that EMC brought into the account, backup is still once a day. Nothing in the EMC technology lets you back up more frequently, like you can with NSB.  Again, if you’re tearing out everything for something new, wouldn’t you like a solution that lets you back up, say, every hour? With less impact than you create backing up once a day? And using no more storage?

NSB lets you think big, outside of the dedupe box.

There’s also recovery. Now clearly, there were good strides made in this area. The article sites a case where internal customers needed to do research on what I’m guessing were larger data sets (it’s not specified). Before, the company had an SLA of two weeks: time to fetch the tape, re-index it, dump it to storage somewhere, etc.   Now they get it done in “a number of hours” by moving data to a virtual machine.

An improvement, surely. But how about getting that virtual machine up and running in ten minutes? Or giving access to the data itself with two mouse clicks and about 60 seconds of elapsed time?  NSB can do it, because we leverage the power of NetApp Snapshot technology when we store backups. There is no need to transform data from a backup format to a usable format: it’s always usable, always right at your fingertips.

Finally, NSB is really one, integrated solution as opposed to two hardware platforms and three software modules (and nobody mentioned bare metal recovery, which would require yet another EMC product, whereas with NSB it’s just there, with BMR functionality available from every backup).  NSB will also eliminate the media servers in the environment, cutting out a layer of cost and complexity.

Maybe that’s why when NSB goes head-to-head with EMC in customer evaluations, we’re winning 80% of the time. Even in EMC shops, despite all the advantages that the incumbent has.

There really is a better way.

{ 0 comments }