Whenever I'm dealing with a fairly complex problem over a long period of time, I find it helpful to write a manifesto to lay out exactly what I want to to accomplish and how to go about doing that. It seems to organize my thought process in such a way that I more clearly understand what the desired result really is, and lets me achieve that desired result sooner. Here for your perusal is the manifesto that I worked on for several days on the subject of revising my backup strategy at work (and at home). If either of my readers can add anything to my thinking here, I'd definitely appreciate the feedback.At work a couple of years ago, we started scanning all paper that would otherwise be filed, and also started taking all x-rays digitally. This has huge advantages, and I'd never consider going back, but the big current drawback is we've now outgrown the capacity of the DAT72 tapes we were using for backup.
At the same time, I've been trying to reevaluate my backup needs at home. I decided I wanted to get all my data onto my laptop, including my music collection, so I never had to use our flaky sucky desktop computer. So, I upgraded to a 320GB drive in the laptop, which is great, but it's now the biggest drive in the house, so I can't conveniently back it up to an external drive like I used to without spending more money on a new, bigger external drive.
I've also just got a new flash memory based Hi-def camcorder (the Canon HF100, which I'm very enamored with so far). Flash is convenient, but I can't just keep a box of tapes around the house anymore, and I'm sure as heck not going to buy a new card every time it fills up (at today's prices, anyway). So, I needed to calculate some way to store my "
magical moving graven images" as part of my backup strategy.
My needs at home are fairly simple. I just want all my data backed up, and I don't want to have to fiddle with anything to make it happen. I want a backup off-site, as well. I used to use Time Machine (OS X built-in backup magic), but I'm stuck with a couple of it's limitations, now. Time Machine's usefulness only comes when the external drive you're backing up to is larger than your computer's drive. Time Machine can back up a smaller data set than your whole drive, but unlike other backup programs, you don't tell Time Machine what you want it to back up. Instead, it defaults to the whole drive, and you have to go in and tell it what you
don't want backed up. I find the interface for setting exclusions to be a little cumbersome and time-consuming, especially if the only thing you actually want to back up is buried several directories deep on your hard drive. Additionally, Time Machine doesn't provide me any off-site backup unless I use a different external hard drive and take it somewhere, which isn't a huge hassle, but gets out of date unless I bring it back and swap frequently.
My needs at work are simple also. I need all the patient data to never go away or go bad ever for any length of time. The actual implementation of this ideal is a little more complicated though.
The server's got a mirrored RAID, so that addresses the problem of immediate hard drive failure, but then I need something external to the server to handle the case of complete server meltdown or database corruption, neither of which RAID can help with. So, I back up to tape every night. But, there may be cases where some problem or change happens that we don't catch right away, and need to go back a few days. So, I set up a week's worth of backup tapes to be able to go back to any point during that week. I may need to go back farther than that, so I've got a set of tapes alternating weekly backups so that I can go back a couple of weeks if necessary (although not necessarily to the precise date that I want). And I need offsite for disaster recovery, so I alternate a set of monthly tapes and take one with me every month.
There are a number of drawbacks with the current setup. One, if someone doesn't put the right tape in on the right day, the backup fails. I assumed this wouldn't be a problem, but I've left it to other people to do instead of doing it myself, and I'm only getting like 90% success that way. That's not a huge deal on a daily backup, but if I miss a monthly and somehow don't catch it, I might end up with an off-site backup that's three months old. With my luck, that would be the month we have the fire, and then we're really screwed. Two, this only addresses our patient record database, and the x-rays. This doesn't back up any other stuff like our accounting records, patient correspondence, internal documents or other stuff that we'd really miss if it were lost.
The bigger drawback, though, is that even though we're just backing up patient records and x-rays using this method, our data set is just too big for those tapes which formerly seemed so capacious. I can fix that a couple of ways. I can get an autoloader, and keep using DAT72. That would fix the compliance issue as well, but cost a lot. I could get a new drive and tapes in some higher capacity. That would still have the compliance issue, and also cost money. Or, if I didn't want to spend any more money, then I could reconfigure the daily backups to be differentials instead of doing a full backup each day. Then, I'd only do a full backup once per week, scheduling that backup to run overnight as usual, but then finishing the next day after someone changes the tape. This would mean even more tape changes for someone to forget to do, and would actually get part of the backup running during business hours, which I don't want to do. Our hardware isn't exactly top of the line anymore, and I want the server to not be spinning its disks feeding a backup at the same time three other people are trying to feed it an x-ray.
So, what I really want to do at work is take the humans out of the equation, and get something automated with a more up to date offsite backup. So, what I really want is to separate the tasks into some sort of local backup coupled with an online backup service. Local backup only should be easy to do, because I have plenty of unused disk space on the network here. I can designate one or more machines as the home for backups, and then just configure my backup software to copy the data to them with the right combination of full backups and differentials to get as far back as I want to go. That part of the problem's solved with $0 expenditure for at least a few more years.
For online, the first place I looked was the vendor of the practice management software. They have an online backup client and storage available, but the client is hideous and ridiculously complicated. Worse, the storage component is ludicrously expensive. We'd back up maybe 40GB, but because it's priced in tiers, we'd have to sign up for the 50GB tier at $150/month. That's after a $100 startup fee, too. That's asinine. This is Dentrix eBackup, by the way. I need to name it here in case anyone is insane enough to consider it and googles the name.
I then checked out Mozy, because I heard a few good things about them. They had a lot going for them, because they were cheaper (even in their business offering), and had a cross platform client. But, when I looked closer, I didn't like what I saw.
They had a free trial, but it's capacity limited, so I had to pony up for 1 month at $26.95 just to test out how it would deal with my full backup set. The pricing is also tiered, although much more reasonable than Dentrix. I still don't like this, though, because it's capped at the level you buy. If your backup set starts to exceed that level, I don't know what will happen, but it seems like your backup will fail. Sure, you'll probably get notified that it's time to upgrade your plan, but I don't want to deal with that. Also, the Mac client was second-class to the Windows client, which was already nothing to write home about.
When I started Googling for reviews, I discovered that they evidently have some really huge issues restoring files. You can restore through the client, but there's some sort of packaging that needs to be done, and that takes a while and apparently doesn't work all the time. You have the option of restoring from a web page, which is really nice, except again you have to wait for the files to be packaged, which could take a day or more for a long backup set, delaying what will already be a painful download. Again, it doesn't work all the time, either.
A third option for restore is for them to burn your data to DVD and Fedex it to you. If you've got 50GB of data, this seems like an excellent solution whatever the price, since it would take several days to download 50GB on a T1. However, the net is full of stories of people who ordered restore DVDs, and then didn't get them for weeks. That's completely unacceptable. If you're going to offer a service like that, you have to automate the process, allocate
x minutes for burning each DVD, then when the order is placed, have your order system do the math to determine whether it makes that day's cutoff or gets bumped to the next. Display that delivery date to the customer, and
then stick to it!. If you can't do this, do not offer the service. If you do offer a service that you know you can't deliver in the way a customer would expect, you might as well have your order confirmation page be
a big ASCII drawing of a middle finger, because that's the kind of contempt you're showing for your customers.
Aside from the DVD thing, a backup system without a bulletproof restore process is no backup system at all. So, after reading all of those horrible things about Mozy, I didn't even wait for the first backup to finish and went to cancel my account. There is no link online to cancel a pro account. I had to email, and the guy told me where to go to find the link to cancel, but it wasn't there, and I sent him a screenshot to prove it. He said he'd cancel it manually, but I'll believe it when I don't see a charge next month.
Sorry, Mozy. I never really got the chance to get to know you myself, but evidently you really suck.
Many of the online reviewers of Mozy mentioned that once they kicked Mozy to the curb, they switched to CrashPlan and loved it. So, that was my next stop. I'm not in love yet, but I do kind of have a crush on CrashPlan.
CrashPlan is a little more of a philosophy than a backup program. CrashPlan's philosophy is that what everyone needs is a nice simple program that can run in the background and backup your data to your one or more friends' computers over the Internet. I've seen some programs with this idea before, but CrashPlan's is by far the most polished and simple. The thinking is that you'll know someone with extra space on their computer who'll host a backup for you, and you might have some extra space to return the favor for them. The common scenario they suggest if you don't have the free space is to each buy an external disk and station it at each other's houses.
Online backup is all it does, over LAN or Internet. No backup to a local disk. No backup to external drives. Just online backups. The software has a one-time charge of
$20$25 or $60 for the more customizable Pro version, then no other charges as long as you're supplying your own backup location.
If you don't have any friends, fear not, because CrashPlan will offer to be your friend for a fee. You don't have to host their backups for them, either. Their CrashPlan Central service provides integrated hosting for your backups at a flat fee of 10 cents per GB per month, with a minimum charge of $5. That's about half the price of Mozy, and 1/30th the price of Dentrix.
The other features that I like are:
- 30 day free trial, of the software and unlimited CrashPlan Central
- All of the data is compressed, encrypted, and deduplicated before being sent to minimize bandwidth. Files that have changed will only send the portion of the file that has changed.
They even claim that if multiple computers are backing up to the same location, it will not send any data that's duplicated between them. (note: I had read this in someone else's review. Turns out it's not true. Sorry)
- Versioning, with the option to keep x versions, or unlimited.
- A pretty simple restore interface. You just pick the date and time you want to restore to off a calendar and go from there.
Backs up in order of modified date. An intelligent prioritization algorithm that puts smaller, recently changed files ahead of larger, stale files. Even if the first backup's not done, it'll make sure that recently changed files get backed up first, and re-backed up if they change again before moving all of your super old files.
- Will watch files in real-time, and back them up as soon as they change (or after a user-configurable number of minutes)
- If you're using their storage, it's just billed by total usage at the end of the month. If you back up 100GB and pare down to 40GB by the end of the month, you only pay for 40GB
- If you have multiple computers, they can use the same CrashPlan Central account, which pools the usage. So, you could back up two computers, each with 30GB (for example), and only pay $6 instead of 2 X $5.
- There are a lot of customizable options as well, such as when it runs, how long after a change a file will get backed up, how much CPU and bandwidth it uses, QOS, etc.
- It's cross-platform, and works the same on all platforms. That should be expected since it's written in Java, though. (Clarification: It's possibly only the front-end that's written in Java. The back-end engine appears to be a platform specific daemon or service.)
- A local restore option, in which you get the backup archive onto the machine you want to restore to, moving physical drives if necessary, then run the restore locally. This in theory would go much faster than even doing the restore over a LAN.
I've been using it for a few weeks now, and I believe it will suit my needs well. I've already got it backing up my desktop at work and the server to the CrashPlan Central. I could also install it on the backup machine at work to do my local backups through it, but I'm pretty sure I won't. There's just no need to run through the extra CPU overhead for compression and encryption for local backups, plus I don't want my backups on the LAN to be encrypted. If I need that data, I want at it fast, without anything standing in my way. So, I'll probably still use NTBackup and scheduled jobs to do the local backups even though it's not as easy to use or polished as CrashPlan. However, CrashPlan seems like the way to go to install on the two computers that I need backed up offline.
I'm not 100% convinced, though. I'm still having some concerns and some unanswered questions. The web site is very sparse and the documentation equally so. This is a testament to the ease of use of the product, but there are some options whose interaction could be fairly complex, and it would helpful to have clear documentation about how they work. Either that or the wording in the client could be clarified.
Another problem with the website is that there's lots of references to a business product (as opposed to their Pro product?), but no real information about what it is or why I should use it. Most of the links about it go to pages that describe the Pro product. The few oblique references I could find made it seem like a VMWare server image used as a client/server thing for backing up multiple desktops in your organization. I only need to back up one desktop and a server; all other computers here are glorified terminals. So, maybe I don't need it. But someone probably does, and you can't sell your product if they don't know what it is/does.
The program is dog slow at backing up over the LAN. I've been tryng to have my latop seed the backup to the server here at work so I don't spend three weeks doing the initial backup from home. I'm not allocating it all of the CPU (unknown whether or not it supports dual processor), so maybe that's the bottleneck, but it's barely faster on the LAN than over the cable modem at home. It's certainly not 30-50 times faster than the Internet as they
claim on their web site. 1.30-1.50 times faster, maybe...
Another thing is that it doesn't do VSS or have any way of backing up open files. I have scripts shutting off all the computers at 6 PM, so there shouldn't normally be any open files in the practice database by the time the backup starts, but I do work late frequently, and I'm sure that will be an issue at some point. If the backup goes often enough it's not much of an issue, but still. They have a beta client with VSS support, but it's XP only, so apparently doesn't work with Windows Server.
The versioning only lets you specify to retain unlimited versions, or to specify an actual number. That's useless to me. Our database is a set of files that's all got to be in sync or massive corruption will occur. So, having the 7 most recent versions of a file that changes every 2 months extends pretty far back, but then I would also have the latest 7 versions of a file that changes hourly. So, no matter how many versions I set that setting to, I can only go back as far as the most frequently updated file. So, I have to set that to unlimited or none at all. I'd really rather have a time based option like "retain the last
x days worth of versions" or something. Think Time Machine.
This is an issue because I can't find out if the CrashPlan Central usage that you pay for is the actual total of all versions of all files on the server, the actual de-duped disk usage of all versions on the server, or just the usage of the current set without regard to versions or deleted files. If it's all including versions and deleted files, and you cycle through these files a lot, you could quickly find yourself with a set that takes 20GB on your box but takes up 200GB at CrashPlan Central.
And then there are the options that might look self-explanatory, but really aren't, especially when you try to figure out how they interact:
There's an option whether or not to keep deleted files at CrashPlan Central and for how long, but if you choose to never remove them, how long are deleted files kept? Is it really forever? Or is it in some way tied to the versions number? If really forever, how does that figure in to disk usage? And what happens in the case of the file that keeps getting deleted and then recreated. I might see that as multiple versions. CrashPlan may see that as a bunch of separate deleted files. If I don't want to see versions, but do want to retain deleted files, what happens there?
There's a setting for "Back up changed files after:" set to some number of minutes. On the surface, this seems useful to control the number of versions of a file I end up with. If I've got a file that changes every minute. I might only want 1 version per hour at most, not 60. But how does this option really work? If this is set to 60, does this mean the file won't get backed up until a 60 minute has elapsed with no changes to the file? Or does this mean that the file won't get backed up until 60 minutes have elapsed since the last time it was backed up? If it's set to 60, and the last time it was backed up was 2 days ago, and the file changes, how long after the change will it get backed up? Less than or greater than 60 minutes?
The program has by default a realtime scan for changes in files, as well as a set interval to do a full filesystem scan, looking for any changed files that the real-time scan missed. If real-time scanning is off, and "Verify Backup Selection Every:" is set to only every 7 days, will nothing get backed up in between? In other words, when real-time scanning is off, is the verify scan the only way that CrashPlan knows to back up a file?
So, if real-time is off, and a file is changed at 5:55 PM on Tuesday, and "Back up changed files after:" is set to 60, and the verify scan is set to run at 6:00PM every 7 days and the scan actually starts at 6:00PM that Tuesday, what happens? Does the file get caught by that scan and backed up? Does it get caught, but not get backed up because it hasn't been 60 minutes since the change? Does it get flagged for backup once 60 minutes has elapsed? Or, does it not get backed up until the
next scan catches it the following Tuesday?
I'm good at QA and setting up test scenarios, so given a few days I could answer these questions by myself. However, I shouldn't have to. They should have clear wording for the options themselves, and enough info in the documentation to be able to answer them.
When the documentation (
PDF) says something unclear, it has what looks to be a link to further information. For example, when talking about versions, it has a blue sentence afterwards that says "What happens if I keep all the versions of a file?". Hey, that's what I'd like to know. It looks like a link to a FAQ or something, but it's not; it's just blue text. I thought maybe these were links and they broke them in the PDF conversion, but the more I look at it, the more I think these look like notes from QA or a technical writer of questions that they needed answered by development. Maybe some developer wrote the first draft, and then some tech writer cleaned it up and added some notes to answer later, then got hit by a bus. Not knowing the tech writer wasn't actually done with it before his untimely demise, they just threw it straight on the web. Also, the screenshots don't match the currently shipping product. Not good form, guys.
As for my home needs, I've got enough space left over on the server at work to do a full online backup of my latop through CrashPlan free of charge (after initial software license fee) to that server. So, now I know that whatever happens, I'm fully backed up off site in a very current fashion. That's never been the case before, and I'm totally excited by that. I'd like a more readily accessible local backup, though, so I'll probably still use Time Machine and it's crappy exclusions interface just to get my more crucial stuff in locally accessible form. In the long term, I'll either get a nice big external drive, or a Time Capsule, or something like that and do it better.
As for the camcorder, I was holding off on its purchase for a while because I assumed flash would end up way more expensive by the time I backed up the files in a reliable (i.e. redundant and off-site) way. Once I ran the numbers, though, I saw that although initial investment in the backup media may be more, it's actually cheaper on a per-hour basis.
DV tapes, bought in bulk from the Price Club^H^H^H^H^H^H^H^H^H^HCostco might run about $3 a piece, with each tape holding an hour. This camcorder at highest settings uses a little under 8GB for an hour of footage. If I look around for the right deal, I can get a 750GB external drive for $100, or about $1/hour. Drives are inherently less reliable than tapes, so I need to get another one to compensate for that, but then, I'm still only at $2/hour. And, if I take one drive to work, I've got the safety of off-site storage, which I never had with tape. Yeah, a tape will last forever with at most a couple of drop outs or sparklies, but if my house burns down, it's gone.
In the short term (like for the first few GB of footage), I'll just store the files from the camcorder on my hard drive and let them get backed up with everything else, either locally or off-site. Then, when that pile gets too big, I'll move them to two of the external drives I already have, and carry one off-site. When the file pile gets too big for those (the smallest external I currently have is 40GB), only then will I have to shell out for new external drives dedicated for storing video. At that time, pricing might be close to 50 cents per hour, making it way cheaper, indeed.
In the longer term, when I've got a 4TB drive in my laptop, and a 100mbit/s upload bandwidth on my internet connection, I'll reintegrate all my movie footage to my laptop the same way I keep all my photos and music there, and just let it get backed up with everything else. In the meantime, I'm fine with using offline storage for the video because I really don't need access to every minute of video footage I've ever shot all the time.