As we all know, IOWAIT is the Professor Moriarty of good performance. So, I decided to replace the oldest, least powerful rootserver w/ a brandnew one. The following describes a rough overview of the steps involved to conduct a non-parallel upgrade of a productive system, which – at least in my case – succeeded perfectly.
By using virtualization, we gain a modularization of the overall functionality. However, the rootserver itself needs to be a stable, scalable (as in having enough RAM and CPU power) and ideally secure platform. In my case, that also involved server hardening, VPN, a robust firewall ruleset, GPG and monitoring of e.g. netflow and system logs as well as libvirt for better VM maintenance.
The first thing on the todo is planning the productive VM backup, which must be taken right before the actual upgrade to pertain actuality of all data. A while ago, I used to convert the RAW images to QCOW2 first (also as some sort of disk integrity check), then encrypted the resulting file using gpg. However, since the I/O of the old rootserver was rather a pain, I just skipped the conversion and directly encrypted the RAW images (gpg also compresses data) which has proven to work out very well.
What you should ask yourself now is:
- How big is every VM (and how much data is actually in it)
- How long does it take to create a backup of it
- How long does it take to transfer the backup
- Result: How long does it take for all VMs to be safely backed up
As we are already using git annex for all of our backups thanks to the tip from rpw, we of coz use this to actually store the backups redundantly @ multiple other root servers. All the configuration files are also redundantly backed up so that we are able to conduct a fast rollout.
So, after having done the steps mentioned above, the old server can be securely deleted. Wiping is most secure, but not really an option b/c we are in a hurry to have a productive system again. Reinstalling the old server 4 times w/ a fresh linux works, and the fact that most sensitive data was already stored using GPG allows to take less care @ secure data deletion.
The data center involved worked really fast, and it took less than 30 minutes for them to disconnect the old and connect the new server in their rack, even tho it was around 2 A.M.
When the fresh system was connected, the configuration rollout could take place, and after the backup restoration via git-annex (hint: git annex reinit $OLDUID really made sense here!) or via scp for only the productive VMs at first, we should take the time and adjust the CPU featureset for the VMs, e.g. using virt-manager to meet the new specs, also to overcome some ugly kvm warnings.
Last but not least, here are the times needed for the actual migration steps in my case:
- Backup of 5 productive VMs and configs + transfer: ~ 3 hrs
- Implementation of configs @ new server: ~ 30 minutes
- Transfer and extraction of productive VMs: ~ 1.5 hrs
- “Production Gap” caused by the whole migration: ~ 5hrs
- Transfer of all git-annex data (430GB): ~ 3 hrs
In most cases, a good planning and realistic preparation (which clearly involves experience) is really a must, and if you ask me, it is also alot of fun, especially when the new server has ~ 4 times the capacity and performance of the old one while being cheaper at the same time.