This documentation concerns version 1.3.2_dynamic_ip of yarbu and is available as archived html1 , pdf2 , postscript3 or online4
It is possible that you have been using computers for years without ever entering the situation where you loose precious data through accidental deletion, malicious action or hardware failure. Then again, if that were the case you would probably not be reading this. If you have been the victim of data loss, my commiserations. Now, never let it happen again!
This project is, as the name suggests, another backup / restore utility based around the well known powerful rsync programme5 . It is designed to act as a server on a dedicated backup machine that will periodically backup a set of remote clients in a robust, transparent manner. To download and install the latest version, please see the project download area6 of sourceforge.
As depicted in figure 1 the backup server contacts various clients. These clients can be hetrogeneous and can be servers for other machines. The example below indicates a PC (Linux) client, a SUN Solaris client and a Linux machine acting as a SAMBA server for other Windows PCs all being backed up by a central backup server running yarbu.
|
|
The concept of rotating snapshot like backups is nothing new. Much of this script is inspired by some of the work done by Mike Rubel7 and others. A list of alternative scripts and projects can be found in the section Contributed codes8 . For a number of reasons I have found a number of (minor) issues that these scripts do not all tackle. It is my aim for this project to fullfill the following criteria:
There are naturally a number of different ways of approaching the problem of backing up a directory. In general we would like to be able to restore the state of every file within a certain directory on a machine exactly as they were at a given time in the past. This means preserving the ownership, time stamps, permissions and other attributes of files or symbolic links and archiving all these files in some easily recoverable place.
Traditional backup systems are based upon tape drives. With the price of hard disk storage plummeting the cost benefit of different storage media has changed dramatically. Since a good automated tape drive costs many thousands of pounds, despite the ultimately lower cost of the media, hard disks still offer a cheaper storage option for medium sized institutions (hundreds of machines) with the bonus of essentially instant access to the required data.
Ideally we wish the backup system to be as divorced from the detail of the clients as possible. We do not want to have to install packages on the client machines since this can become an administrative nightmare. Far better is to treat the client machines as “dumb” since this allows all the logic to be centred on the server.
This project is initially aimed at a small to medium sized installation of machines, tens of machines rather than hundreds. It is assumed that these machines are connected to a reliable local area network and are nominally available 24 hours a day 7 days a week.
Configuration for this backup system is based around familiar, simple configuration files which are essentially just a list of parameter, value pairs. It is deliberately designed to be straightforward. If you wish to do weird and wonderful things with your backup strategy, your backup strategy is probably flawed. For example, the following configuration file will backup the /home directory of a machine called “apples” to a local directory /backups on the backup server.
SOURCE=apples:/home
TARGET=/backups |
Aside from configuring root SSH access that is all the configuration that is required. If you have downloaded and installed the rpm version of the software, adding the above file as (say) /etc/yum/conf/apples.conf is all you need to do to get rolling hourly backups.
While the configuration is simple, it could be argued that it is too simple. Most of the time a set of machines will be configured in a similar manner. It would be good to have a master configuration file which could include placeholders, and a separate file that gives a list of parameters (for example a set of host names). Alternatively one might want a single, heirachial type of configuration file in the manner of Apache. Implementing such a system would of course severely complicate the code and be fiendishly difficult to implement using just bash commands.
Another issue is one of reliability. It would be nice if unreliably connected machines could also be handled easily. An example of a machine that is unreliably connected could include a laptop or desktop that is switched off at the end of the day. In keeping with the idea of the dumb client, we would need to place the logic and machinery for this in the backup server. This means the backup server would have to regularly monitor (through ‘ping’ probably) the state of all the client machines. At the moment if a machine misses a daily backup it is simply not done. Clearly an unsatisfactory situation.
One should realise that it is possible to do disastrous things with this script. Implicit is the assumption that the backup server is “friendly”, in other words trusted. Since the backup server requires password-less root access to the clients compromising the backup server allows the potential compromising of the client servers.
As an initial proof of concept this project is rapidly maturing. A number of issues regarding ease of use and more complex interaction have arisen however that will cause this project to branch to a higher main version number relatively soon. The 1.1.x branch should now be considered as discontinued.
The current branch 1.2.x is now stable and does what was originally envisaged. Like much software development one only really knows what is wanted when the finished products limitations become evident. The roadmap for the 1.2.x branch is as follows.
It is fairly clear now that the original design aims of making the script bash only and system independent are somwhat conflicting. For this reason new versions will be scripted in Python.
The author, Dr. Edward Grace, is based at Imperial College London researching optical data storage techniques. His home page9 contains more information about this research and other projects.