You guys talk about the how and where to. I'm talking about the what, which I think comes first.
Well, I was talking about a full live snapshot... which kind of means everything. So
the what was inherently covered
So I need to find all the dirs / files that needs to be monitored for changes, in order to include them in the incremental part of the backup, when they do change.
The approach that I picked up at a previous job, when I was under the wing of a long time UNIX hacker, was that instead of whitelisting what you think you want, you should instead blacklist what you know you don't need. The argument being that when you come to restore, it's better to have everything available should you need it, rather than to find out that you forgot to add some important dir to your whitelist.
We'd found that with most incrementals, that outside /tmp and the databases (and in this case home/shares data) most changes on a day-to-day is mostly just log activity. And every once in a while other things would change - we'd update a script or install some lib or something. So we'd just run system wide incrementals as they tended to be quite small. We'd obviously do things like exclude database datadirs as we'd handle those separately with the full backups and the database backups.
Towards the end of my time there, we came across and started using
hcp, and we had a relatively basic script that would mount a hotcopy, and then run something like e.g:
Code: Select all
find /var/hotcopy/sda1 -type f -mmin -1440 -wholename '/proc' -prune \
-o -wholename '/some/oracle/data' -prune \
-o -wholename '/some/other/exclusion' \
2>/dev/null | xargs tar -rzf /mnt/backups/`date +%Y%m%d`-backup.tgz
(note, that's an example off the top of my head so it's probably got a couple of errors)
This would find all files changed in the last 1440 minutes (i.e. 1 day, you could use -mtime -1) and then tar them up with a datestamped tgz. The tgz would then get scp'd off. The script would go on to send us an email report showing the output of the scp and an ls -lah /mnt/backups | sort -nr | head -n 10, which would show us the last 10 backups and their sizes. That would give us at-a-glance trending i.e. If a backup jumped significantly in size over the previous, we could investigate. The script also re-ran the find and piped it out into a datestamped txt file, so for our investigation we could simply diff with the previous.
Before using hotcopy, we used trickery along the lines of:
Code: Select all
#prepare for tomorrow's run
touch -t /tmp/`date +%Y%m%d`-stamp.txt
yesterday = /tmp/`date -d "1 day" +%Y%m%d`-stamp.txt
find / -newer $yesterday | xargs tar -rzf /mnt/backups/`date +%Y%m%d`-backup.tgz
Of course the script was a bit more in depth than that, but it would fail on open/locked files where the hotcopy version didn't.
MySQL backups were handled by
AutoMySQLBackup, which I wholly recommend. Postgres was just a pg_dump script, Oracle was likewise similarly scripted.
Also, this procedure should allow me to re-create the HDA I had, but on other hardware.
Say I'm running on i386 now, and my CPU breaks for whatever reason, and I decide to buy a new one, but a x86_64 this time. I want to be able to get back my HDA... I just upgraded the CPU!
Full metal backups won't help in those cases.
i386 will run fine on x86_64
It's when you go the other way, or to some completely different architecture like ARM or SPARC that you'll have trouble. See Moredruid's post. The advantage here with having a full snapshot is that you've already got everything. Just reinstall Amahi, mount your snapshot .img and copy out what you need. This is where your approach can help, and this is probably a good start:
http://www.cyberciti.biz/tips/linux-get ... store.html
/edit: Building on the above, and doing it your way, you'd probably want to refer to this thread as a
configuration backup and restore. I think for that you could just generate a nice xml file that had the install code (which itself would look after a bunch of settings), the above installed software, a list of installed apps etc. You want the output of a theoretical hda-config-backup to be as small as possible; a tgz or zip with a couple of files in it. The hda-config-restore process unzips this file and greps out what's required. Having a seperate database dump should be considered useful but not mandatory IMHO.
If that's what you meant, then I like it, and apologise for going off on a tangent