Expert Interview with Andrew Klein, Director of Product Marketing at BackBlaze on backing up Big Data
Backblaze currently stores more than 100 petabytes of customer data on software and hardware developed in-house. Big Data is all about the explosion of Volume. Provocateurs have even suggested that Big Data means the death of archive. Do you agree?
Data generally can be categorized into three groups: transactional, warm and cold. Archived data generally resides in either warm or cold storage. If by the death of archive you mean everything that is transactional in nature, then Big Data will not be the death of archive. It makes no economic sense to keep everything transactional as it goes beyond just storage and includes processing and memory considerations. Better to keep archived data in cold storage, and elevate it to transactional status when (or if) needed.
In a conventional data warehouse framework, a product like Syncsort DMX-h can be used to ETL data into the warehouse. Migration of data to an archive repository can be handled in parallel or in a later stage, with different pros and cons for the two strategies. Newer workflow has data getting piped into Hadoop or other NoSQL databases. How might this change back up workflow?
This is not relevant at all to Backblaze as all of our data is best classified as warm storage which is processed inbound once and stored. Nothing else is required.
Does BackBlaze help customers with encryption and certificate management? There are horror stories out there about backups that could not be restored because there were permissions or rejected certificates.
Backblaze handles all the encryption that is done via PKI. A user’s account credentials are used to create their key pair, which is used to encrypt their data on their system, before it is transported to Backblaze over an HTTPS connection and then stored encrypted at Backblaze. A user can elect to add a private encryption key that is used to encrypt their key pair, adding a second layer of protection.
This interviewer had an extended, unhappy experience with a widely recognized supplier of online backup. People can get pretty agitated when their backups have failed. Product reliability is obviously much of the answer, but how do you prepare customer service for life on the front lines?
We ask our customer service agents to provide the kind of service they would expect if they contacted us. That said, some people are rude and disrespectful right from the start, and our agents have the right to fire a customer and refund their money. We will go to great lengths to make sure people are satisfied with our service, so calling us a “F%#(Ung As%-&$(@” in your first email to us is not the way to remain a customer.
Backup and restore software has plenty of nuance that is lost on the casual observer: treatment of deleted files, renames, permission timelines, handling of shadow or versioned files, overwrite rules, and of course hardware and network errors. How does this affect the perception of the backup marketplace and valuation? How is that best perception counteracted?
We’ve always taken the path that online backup should be so easy that it is invisible to the user. Back up all the data by default, be unlimited, no sneaky pricing add-ons, etc. Since every hard drive will eventually fail, our customers will get their data back, be happy, and tell others; and our business will grow. We believe our valuation will be based on that growth and our demonstrated ability to manage our business. We are not looking to be valued based on a TBD business model, selling our customer’s eyeballs or being today’s flash in the pan.
How do you address the problems customers can encounter when they attempt to restore versions of code or data onto different configurations – OS updates, possibly upgraded hardware and changed permission schemes?
All restores, whether for a single file or a million files, are delivered in ZIP format. This gives the user the ultimate control of what files to restore and where to restore them using a well known, reliable technology.
How do customers classify and manage their backups? Priority? Type? Application? Database? Are there best practices, or do you leave that to customers to sort out?
We automatically back up all the user’s data files on a continuous basis. The user is not involved in the process. Different versions of a file are kept for 30 days and can be retrieved by date if desired. Of course, we keep the newest version of any data file on the user’s system.
What role do you see for compression and encryption? Customer-managed vs. backup-managed – if there is a difference?
We use both compression and encryption. Before data is backed up from a user’s system, a copy is compressed and encrypted and then sent to us for storage. Again, following the “make backup easy” paradigm, we do this without involving the user. When files are restored, the user presents their account credentials, and the data they require is uncompressed and decrypted for them.
Are there any regulation, audit or compliance considerations that are either driving your product designs or motivating prospects to seek you out?
Data encryption while data is stored is a requirement of several regulations such as HIPAA. We didn’t design our system with encryption specifically in support of such regulations; we did so to protect our user’s data. The fact that we also help people and organizations meet their encrypted-based regulatory requirements is a bonus.
Is backup better integrated with DR and risk management schemes today than 10 or 20 years ago?
Making online backup automatic and easy to use certainly helps organizations in their DR and risk management planning. The functionality of systems such as Backblaze is well understood and easy to integrate into such plans.