Search This Blog

Friday, October 16, 2009

Virtualization and Backup in the development environment - some thoughts

Virtualization shakes up Backup strategies! read the computer world article I ran into. I found it as I was searching to find what is new in the backup of virtual machines. Backuo of virtual machines is an issue I've been bothered with, ever since I made my first steps into virtualization, a couple of years ago. (Yes, I know that Mainframe guys have been tackling these issues for decades, in their own ways. But for a windows/linux guy such as myself, virtualization as an everyday practice is something relatively new. About three years as something to be aware with, and about a year and a half as something which is actually used in my technical life). 

Say what you say about virtualization, once you get used to it, like any paradigm-changing technology, it pops up everywhere.

And one of the main problems with these mushrooms-after-the-rain technology expansions is that infrastructure issues are frequently neglected, and my own experience teaches me that backup is the most popular subject to be put aside, for later. Backup is the ultimate winner of neglection, the poster boy of the not-urgent-enough, and the single most important action a sys admin can pre-emptively promote to save the day. 

Looking back at my own career I'm surprised at the number of incidents when I was the one to raise the questions of "how are backups done" about a server or a project, only to discover that this was an issue people never got to deal with. 

Over the years, whenever backup routines I've initiated or promoted were tested, under real life crashes and data disasters, those routines proved to be reliable, and enabled a swift return to work. But dream as you might about the appreciation you get after a crash well handled, you must tackle the frustrating efforts of promoting backup before the crash, dealing with servers that are frequently falling between departments or perceived as something not critical enough (it is just for development, no? I remember a manager asking me about a Unix db machine that served several development teams which has never been backed up since it was setup). 

Worse, you might have brought everything under your responsibility to a perfect backuping state, even including the routine restore mechanisms. But that was yesterday, and virtualization brings new machines into play, some of which you might not even think about when servers come to your mind, and it forces one to look forward, even if this means abandoning old techniques and developing new approaches. 

Of the key issues about virtualization is the ease of restoring a whole machine, In development environments, my own specialty, this creates a huge advantage - you can run a nightly backup creating a full snapshot, and assuming you are equipped to recover quickly from a hardware disaster, the time it will take to return to business as usual, is the time it will take you to rebuild the host. If you really care about such things, you will have more than one host to begin with, and thus, assuming you work with shared storage, the return to business time can be as short as the time it takes to import the backed up image of the machine into the other host. 

But such full backup schemes must not be adopted without other key issues handled. A good backup strategy must answer granular needs. The most common of which is that need to restore a single element. That file the was deleted or changed several weeks ago and was found now to be needed as it once was. If we rely only on full snapshots, the restore time may create some troubles, especially in environments where the existence of two identical virtual machine creates problems. Solutions can be found, some more elegant than others and the problem intensifies when we are not talking about file-system restore issues but on databases that reside within virtual machines. A this stage, the importance of exporting the database on a nightly basis may be found. Such a solution is naturally suitable only for small to medium size databases. As the information revolution is created larger and larger data schemes, we see that development environments too  are growing in size, bringing complications previously known mainly in production environments, to the table of the development environment's specialist. 

And yet, as complicated as life may become, we must not forget the simplest questions: 
1. if tomorrow a hardware failure strikes, can we return with ease to a working formation? 
2. if the answer to question 1 is "no", what does it mean ? (how complicated will the effort be? how much data will be lost? how tolerant is our organization to a situation of a lost development server that cannot be restored? ) 

and to put it in managerial terms - what is the cost/benefit of postponing the creation of a backup solution for our virtual machines, in comparison with the cost/benefit of possible solutions? 
(when this question is asked, the assumption that there will be a fault at a certain point in the not too distant future is essential. do not accept any other. don't forget to propose the possibility of a disk crash on the eve before a deadline.)