Last time out, I posted about the two “problem children” of backup: Big Files and (many many) Little Files.
Traditional backup is failing on both these fronts, and for the same reason: It sees the world as files. Every time a file changes, you have to backup the whole magilla. This is an obvious problem with Big Files (e.g. databases). If you have a terabyte sized DB file and it changes only 5% a day, you still backup the entire TB of data. That’s 95% wasted effort.
The same thing happens with Little Files. Sure, if you have a 2 MB PowerPoint slide and you update 10% of the file, it’s not that big a deal if you backup the full 2 MB — assuming you only have one PowerPoint file. Unfortunately, we have them by the thousands and millions (plus spreadsheets, Word docs, Visios, images…). Little Files also have a second problem in that the process of reading the files itself is inefficient (it takes a whole lot longer to copy a thousand files that equal 5 MB than it does to copy a single 5 MB file).
Fortunately, the answer to both of these dilemmas is simple: stop looking at the world as a bunch of files. Files are built from blocks, and it makes a great deal more sense – from the perspective of backup – to see the world as a pile of blocks rather than files. If I see only blocks, when that TB sized database updates by 5%, I only have to worry about moving the 5%. My 95% wasted effort is now 95% efficiency I’ve recaptured.
And that’s just how Syncsort backup works in our NetApp Syncsort Integrated Backup offering. We monitor data changes below the file system layer at the level of the disk blocks. Each scheduled backup then moves only the block-level changes. Each of these block updates is added to the original “base” backup (the one full backup you ever have to do), and through the magic of snapshot technology each block update backup can be viewed as a full data volume (thanks NetApp SnapVault!). In other words, for the “cost” of a block level incremental backup you get the “value” of a full backup, every time.
The bottom line? Backups complete much faster, typically in minutes (versus what used to take hours). Backups are much more reliable, with success rates in the 99%-plus range. Backups have much less impact on server, storage and network resources, by a factor of 90% or more. And since backups are fast and low-impact, that means you can run many backups per day, capturing many more recovery points and greatly improving your RPO. Finally, a way to meet those stringent recovery SLAs that doesn’t cost you a fortune in primary storage snapshots!
There’s more to this story. A lot more! For starters, there are important differences in how a product decides what blocks need to move. We’ll explore that next time.
{ 0 comments… add one now }