Zum Inhalt springen

A almost perfect rsync over ssh backup script

Last updated on 15. April 2023

Why and What

Creating backups is a basic necessity for anyone who stores data. With the help of a dedicated server, the Linux operating system and the rsync software, you can securely and efficiently retrieve data from any computer via an SSH connection and store it in a backup. It doesn’t matter if there is another Linux server, a Mac or a Windows computer on the other side. But as soon as you have more than a few text files, you have to control the process precisely.
This script contains some important elements, which are essential when dealing with large amounts of data and slow internet lines.

A core script to start with

The basic script comes from wiki.ubuntuusers.de page.
https://wiki.ubuntuusers.de/Skripte/Backup_mit_RSYNC/

The original script

  1. It distinguishes between TOSSH and FROMSSH, i.e. whether the server fetches something or whether you push something to the backup server (TOSSH). In case of FROMSSH there are no access data on the computer to be backed up. Connection credentials to the backup are always the first target of encryption trojans, so it is better that the backup server fetches the data itself, i.e. it is the one that connects to the computer to be backed up -> FROMSSH
  2. It distinguishes between overwriting the folders with the day number every month or creating new folders with the full date as the name -> MONTHROTATE
  3. It can check if a volume to be backed up is connected to the computer via USB -> MOUNTPOINT (I’m not using this.)
  4. It can initiate the sending of a mail with the log of the backup process.
  5. It can read from multiple sources/directories -> array SOURCES
  6. It can delete files that no longer exist in the source also in the backup.

How it works

With a stored key (Use SSH keys and not passwords!) the backup server dials up the computer to be backed up via SSH (FROMSSH). There it compares the files on the volume to be backed up with what it has in the folder pointed to by the symlink with the name „last“. A new folder is then created from the number of the day (e.g. „02“) and all new or changed files are stored in it. On the already existing files a hardlink is created in the new folder, so that these are all „present“ in the new folder. Deleted files do not get a hardlink. If everything is ready, the symlink „last“ is now directed to the new folder. In the next month then simply the old folder with the same day number is overwritten with the current folder (MONTHROTATE).
The date and every single transferred file is written into the log, which is created as an extra file and overwritten every time. This info is then sent as a mail at the end. The backup script is started via crontab of the special backup user (who has no sudo rights) every day.

Disadvantages

  1. If a rsync process takes longer than 24 h then the processes overlap and accumulate (the old one is not ready when a new one starts), block each other and nothing works anymore.
    -> A new process should therefore only start when the previous one is finished.
  2. In large volumes with many files, the log files become so long that they no longer fit into an email and the email is then no longer sent.
    -> For large volumes only errors should be logged.
  3. If an error occurs then a symlink to the newly created folder with the current tag as name is created anyway. This leads to the fact that with the next backup the complete backup is transferred again. With several terabytes this can take a month of data transfer over the internet.
    -> Only change the symlink if there was no error before, otherwise keep the symlink to the healthy folder until there is a backup without errors again. Disadvantage: A second backup on the same day is then not possible even with manual start of the script.
  4. The backup server is only needed for a short time, but runs the whole day.
    -> Use a separate script to shut down the server in the meantime, but only if no backup is currently running or someone is currently logged in via SSH. Separate this task from the backup user and execute it via the administration user or the central /etc/crontab, since root rights are needed for this. (This is not part of this blog.)

Necessary enhancements

  1. A code block that monitors the rsync traffic.
  2. To keep track, a file that stores all log entries, one after the other, with only a few lines per entry -> all-log
  3. Ensuring that the new symlink is only set if the backup was successful

big heat sink, less fans – zazu.berlin environmentally friendly backup server

The complete advanced script with detailed explanation

Part 1

#!/bin/bash

# avoidng collisions with other rsync processes

# the minutes passed to the script 
# 21h*60min = 1260 minutes
minutes=1260

# Make sure no one else is using rsync
pro_on=$(ps aux | grep -c rsync)

# if someone is using rsync
# grep is also producing one entry so -gt 1

while [ $pro_on -gt 1 ]
  do
  sleep 60
  minutes=$(($minutes - 1))

# we are close to the next day no need to start
  if [ $minutes -lt 60 ] ; then
    exit 0
  fi

  pro_on=$(ps aux | grep -c rsync)
done

This is the block to check if other rsync processes are running.
It is adapted from the following code „Letting your server take naps“:

https://distrowatch.com/weekly.php?issue=20120903

This will also be used later to shut down the server during idle hours.

  • With the first line you say that the „bash“ should be entrusted with the execution.
  • „minutes=1260“ means that the postponement of a start of this script should be maximum 21 hours. After that the script will not be executed, because the cron process will start already the next round of backup.
  • „pro_on=$(ps aux | grep -c rsync)“, processes that are running, here „rsync“, the grep command itself also produces a process on the server itself, which is why you have to work with „-gt 1“ i.e. „greater than 1“.
  • After that it goes into a while loop which sleeps with „sleep 60“ for one minute each round and then checks if there is still more than one grep-rsync process running. The minutes are then always counted down by 1 and if one gets under 60 minutes until the 21 hours are over, then it is no longer worth running the script and „exit 0“ aborts. After that cron will try again at the regular time on the next day. This case occurs when another rsync process needs more than 20 hours.

Part 2

# Simple backup with rsync
# local-mode, tossh-mode, fromssh-mode

SOURCES=(/Volumes/my_volume/ )
TARGET="/srv/backup_volume"

# edit or comment with "#"
MONTHROTATE=monthrotate # use DD instead of YYMMDD
RSYNCCONF=(--delete 
--exclude=/Volumes/Raid/.DocumentRevisions-V100
--exclude=/Volumes/Raid/.TemporaryItems --exclude=/Volumes/Raid/.Trashes 
--exclude=/Volumes/Raid/.apdisc)

MAILREC="my_home_mail@something.de"

SSHUSER="linux_backupuser"
FROMSSH="linux_backupuser@999.888.777.666"
#TOSSH="tossh-server"
SSHPORT=22222

MOUNT="/bin/mount"; FGREP="/bin/fgrep"; SSH="/usr/bin/ssh"
LN="/bin/ln"; ECHO="/bin/echo"; DATE="/bin/date"; RM="/bin/rm"
DPKG="/usr/bin/dpkg"; AWK="/usr/bin/awk"; MAIL="/usr/bin/mail"
CUT="/usr/bin/cut"; TR="/usr/bin/tr"; RSYNC="/usr/bin/rsync"
LAST="last"; INC="--link-dest=$TARGET/$LAST"; LS="/bin/ls"

This block is mostly taken from the original.

SOURCES is an array and can contain multiple sources, separated by a space.
SOURCES has a slash at the end, so that the folders /user/Volumes/Data_Folder are not produced each time.

Here is an example:

subpart of part 2:

SOURCES=(/srv/smb/source1/ /srv/smb/source2/ )
TARGET="/srv/my_backup"

SOURCES
If the SOURCE has a „/“ at the end then only the content is copied but not the last specified folder.

SOURCES is an array which will be accessed later with ${SOURCES[@]}. Multiple sources can be specified separated by spaces. There is then separate output in the log, because a „do-loop“ is processed for each SOURCE.

TARGET
has no slash at the end, the slash is added later automatically if it is missing.

subpart of part 2:

RSYNCCONF=(--delete 
--exclude=/Volumes/Raid/.DocumentRevisions-V100
--exclude=/Volumes/Raid/.TemporaryItems --exclude=/Volumes/Raid/.Trashes 
--exclude=/Volumes/Raid/.apdisc)

„-delete“ means that files that are deleted in the source are also deleted on the target. Some point-files (.file) sometimes make reading problems from the source so that it is best to exclude them. The problem is that they change between the reading of the files to be copied and the copying process. This leads to an error.
Since RSYNCCONF is an array, more rsyn flags can be placed here if there is a need.

subpart of part 2:

MONTHROTATE=monthrotate                 
# use DD instead of YYMMDD

Since in a month always the day is specified as folder, the respective folder with the day number e.g. „04“ is overwritten again on the 4th of the next month. If this should not apply, then one can take long dates for the folder with the full date. You will get a new folder every day and you should then delete old ones from time to time.

The deletion and 30th or 31st of the month – MONTHROTATE

Deleting does not seem to delete the files that are still needed as they are hardlinks. These can be multiple links to a file and as long as a hardlink still exists (from another day-folder) the file is not deleted. New backups will then produce another NEW hardlink to the file.
That’s why it doesn’t seem to be a problem if you skip a day.
(At least that’s how I understood it, but don’t see more than one hardlink, in Linux „ls -la“ output).

To track the sequence I added an „all-log“ file to the script, see below.

There is then the symbolic link of „last“ to the last backup folder in each case. The linked folder with „last“ is used for the comparison of old and new files. Only new files are copied to the backup.

subpart of part 2:

MAILREC="my_home_mail@something.de"

SSHUSER="linux_backupuser"
FROMSSH="linux_backupuser@999.888.777.666"
#TOSSH="tossh-server"
SSHPORT=22222

Change the values according to your settings.

(Bash variables do not have to be capitalized, this is just a style of the author here).

Via MAILREC a destination address for the log mails is specified. Attention the mails contain in the original script the complete lister of the saved files. This can be a few hundred MB, which then get stuck in the mail server. That’s why I later included only ERRORS in the log file (at least for the big backup). See below.

FROMSSH or TOSSH, we are backing up from a source here, hence FROMSSH. The argument expects the server connection. FROMSSH is important because then there is no access data to the backup on the server being backed up. This is extremely important for encryption trojans, because they look for exactly that and then encrypt the backup first.

subpart of part 2:

MOUNT="/bin/mount"; FGREP="/bin/fgrep"; 
SSH="/usr/bin/ssh";
LN="/bin/ln"; ECHO="/bin/echo"; DATE="/bin/date"; RM="/bin/rm";
DPKG="/usr/bin/dpkg"; AWK="/usr/bin/awk"; 
MAIL="/usr/bin/mail";
CUT="/usr/bin/cut"; 
TR="/usr/bin/tr"; RSYNC="/usr/bin/rsync";
LAST="last"; INC="--link-dest=$TARGET/$LAST"; LS="/bin/ls"

The individual programs are stored in variables, this is an interesting approach to call in the script programs and make sure that they are found even without being listed in the path variable.

LAST specifies the name of the sym link folder. „last“ is then the sym link to the last backup.

„–link-dest“ is an rsync setting which then points to LAST.

Because I need the „ls“ list-command later I also put it with the full path into a variable. But „ls“ is not only available in bin but also in sbin, here I get it from bin.

Part 3

#zazu log all days in one file
all_logprint() {
    echo -e "\n $*" >> all-days-${0}.log
}

The log file is sent by email after each backup and then overwritten again. You are left with only the collected emails in your email programme, except there were not too many files and the emails were not sent due to the size.
That’s why I wanted to have an „all-days-log-file“ listing all backups, with ERRORS but without the complete backup file lists.
(Further down, I turned off the file list: Default-Out: $INC 2>> $LOG instead of 2>&1 which then lists normal and error output).
The statement is a function, „\n“ produces a new line and „$*“ contains all arguments you give to the function. Here you pass on the log text, then the whole thing is written to a new file at the end „>>“, which is also created if it doesn’t exist. The „-e“ switch causes echo to evaluate the backslash, without it there is no line break.

Part 4

LOG=$0.log
$DATE > $LOG

„$0“ contains the name of the script at this point, so the log file is named like the script plus „.log“ (example: backup_script.sh.log).
After that the complete date/time is written into the log file. So you can see when the backup process started. At the end there is again a date/time entry, after the script is done. You will see how long it took.

Part 5

if [ "${TARGET:${#TARGET}-1:1}" != "/" ]; then
  TARGET=$TARGET/
fi

Since other folders are attached to the destination folder, it needs a slash at the end.
Here it is ensured that it is always there.

Part 6

# zazu check whether today is the same number as the 
# symlink "last" to the numbered folder
# also no 2 backups on one day - gives an ERROR
HEUTE=$($DATE +%d)
cd $TARGET
LAST_SYMLINK=$($LS -l | grep ^l | grep -o "[0-9][0-9]$")
cd /home/sicherung/bin

First the day number of the month is written into the variable „TODAY“: 01 or 02 etc..
Then I change with „cd“ into the target directory and search there with „ls“ and the option „-l“ whether there is a line, which has a „l“ at the beginning, thus it is a symlink. Then I get the last two number units (folder name) into the variable „LAST_SYMLINK“.
After that I return with „cd“ to the directory where the scripts are located, so that later the log files are also written there and not in the target directory.

Part 7

if [ $HEUTE -ne $LAST_SYMLINK ]; then

… code parts 7a to 7e are here …

else
  $ECHO "Since one month no backup or only with ERROR 
         (todays day of the month and day of last backup 
         have the same number), kept the last backup 
         from day ${TODAY}, didn't overwirte the folder. 
         No backup today, next backup is tomorrow." 
    >> $LOG
    ERROR=1
fi

Long „if bracket“:
Here I say, if the symlink points to the same day number as today’s date, then everything must have gone wrong for at least a month. Either there was no backup for a month (raid on Mac was off) or all backups ended with an error and that’s why no symlink with „last“ was created anymore.

Therefore:
Only if this is not the case( „-ne“), then start a new backup, otherwise keep the full backup for that day and try to make a new one the next day.


Attention:
A new symlink with „last“ will only be created if the backup succeeds. Otherwise you have the case that a broken backup or an incomplete backup is at the symlink and rsync then makes the whole thing again and that for the whole many terabytes.


Further down there is an „else“, so in case of tag equality an error is generated and written into the log with an explanation.

Part 7a

if [ -z "$MONTHROTATE" ]; then
  TODAY=$($DATE +%y%m%d)
else
  TODAY=$($DATE +%d)
fi

If – „z“ zero MOUNTPOINT, thus variable text is empty, make with „-z“ zero MONTHROTATE from TODAY the full date otherwise only the month number of the day „$Date +%d“. The whole thing is again in a bracket with $ sign, so that first the bracket content is evaluated and then the new info can be passed to the variable „TODAY“, which is also formed here.
At the end the folder is called e.g. „01“ or „20220101“.

Part 7b

if [ "$SSHUSER" ] && [ "$SSHPORT" ]; then
    S="$SSH -p $SSHPORT -l $SSHUSER";
fi

Here the SSH access part, which is later in the rsync command, is written into the variable „S“, which is also created with this command.

Part 7c

for SOURCE in "${SOURCES[@]}"
  do

#FROMSSH – this I use here

    if [ "$S" ] && [ "$FROMSSH" ] && [ -z "$TOSSH" ]; 
    then
      $ECHO "$RSYNC -e \"$S\" -avR \"$FROMSSH:$SOURCE\"  
      ${RSYNCCONF[@]} $TARGET$TODAY $INC" >> $LOG

# log only errors "2>> $LOG" since logging all files 
# backuped will crash the mailer, original ">> $LOG 
# 2>&1"

      $RSYNC -e "$S" -avR "$FROMSSH:$SOURCE" "$
      {RSYNCCONF[@]}" "$TARGET"$TODAY $INC 2>> $LOG
      if [ $? -ne 0 ]; then
        ERROR=1
      fi
   fi

#TOSSH – I'm not using here

   if [ "$S" ] && [ "$TOSSH" ] && [ -z "$FROMSSH" ]; 
   then
     $ECHO "$RSYNC -e \"$S\" -avR \"$SOURCE\" $
     {RSYNCCONF[@]} \"$TOSSH:$TARGET$TODAY\" $INC " 
     >> $LOG

     $RSYNC -e "$S" -avR "$SOURCE" "${RSYNCCONF[@]}" 
     "$TOSSH:\"$TARGET\"$TODAY" $INC >> $LOG 2>&1
        if [ $? -ne 0 ]; then
          ERROR=1
        fi
    fi

#Local Backup – I'm not using here

   if [ -z "$S" ]; then
      $ECHO "$RSYNC -avR \"$SOURCE\" ${RSYNCCONF[@]} 
      $TARGET$TODAY $INC" >> $LOG

      $RSYNC -avR "$SOURCE" "${RSYNCCONF[@]}"  
      "$TARGET"$TODAY $INC >> $LOG 2>&1
         if [ $? -ne 0 ]; then
           ERROR=1
        fi
     fi

done

Update April 2023
The old "$FROMSSH:\"$SOURCE\"" command stopped working with a software update of my router. (It is the second part of the FROMSSH block) It would produce an output like:

rsync ssh user@main-domain.de:"/my/folder-to-copy-from"

The rsync manpage says:

Access via remote shell:
    Pull:
        rsync [OPTION...] [USER@]HOST:SRC... [DEST]

So there is no quotes around the source folder. The command is now:

"$FROMSSH:$SOURCE"

The SSH code block


There are three main ifs:
a. FROMSSH
b. TOSSH
c. local backup

Now here comes the actual rsyn command loop (for – in – do).

The first block just writes the complete rsync command to the log file, so you can see what was executed.

The second block then actually executes rsync and writes what rsync reports to „stdout“ (standard out) and „stderr“ (standard error) to the log file:
„>> $LOG 2>&1“
Errors „2“ are also routed to stdout „&1“ and thus also written to the log file at the end. With me it is only „2>>“, so log only Errors, otherwise the file becomes too large with 40TB and logging every single file you transfer.

If the exit status of the last operation was not successful ($? -ne 0), so „-ne“ not equal 0 = success, then the variable ERROR = 1.
This variable then produces the word „Error“ in the subject of the email.

The actual command of rsync:

  • $RSYNC contains the program name with path.
  • „-e“ stands for remote and you have to specify a protocol, here SSH.
    (-e, –rsh=COMMAND specify the remote shell to use)
  • „$S“ contains the SSH connection data as specified above.
  • „-avR“ are the flags for rsync a=achive, v=verbose, R=relative pathnames
  • „FROMSSH“ contains user@ip-address
  • $SOURCE is the respective SOURCE over which the do loop will process.
  • „${RSYNCCONF[@]}“ the array with the rsync flags (e.g. –delete)
  • „$TARGET“$TODAY target is taken as text, the date Today is evaluated
  • $INC the link destination, sym link on last is still specified at the end –link-dest=last

update: Comments from Hacker News:

https://news.ycombinator.com/item?id=30465581

„rsync must be invoked with „–archive –xattrs –acls“ to guarantee complete file copies.“
The additional flags for the rsync command would be: „A“ and „X“.

man rsync:

-A, --acls                  preserve ACLs (implies -p)
-X, --xattrs                preserve extended attributes

Both deal with extended meta data of files. On my tests both option work fine on a Linux machine, but not in the case of rsync-ing from a Mac. Even with the latest (3.2. -2020) rsync installed on the Mac.

Bash basics

It is interesting what is escaped „\“ in the script and what is in „quotes“.

Quoting single ‚x‘ and double „y“ means that a command should be interpreted contiguously.

touch „File 1
touch ‚File 1

For variables, “ “ double quotes mean that the contents of the variable should be here, whereas single ‚$variable‘ quotes around a variable only print the variable name = variable

`Backticks are commands whose output becomes or is intended to be a string.

https://wiki.ubuntuusers.de/Shell/Bash-Skripting-Guide_für_Anfänger/

Part 7d

if [ "$S" ] && [ "$TOSSH" ] && [ -z "$FROMSSH" ]; then
  $ECHO "$SSH -p $SSHPORT -l $SSHUSER $TOSSH $LN -nsf 
  $TARGET$TODAY $TARGET$LAST" >> $LOG
  $SSH -p $SSHPORT -l $SSHUSER $TOSSH "$LN -nsf
  \"$TARGET\"$TODAY \"$TARGET\"$LAST" >> $LOG 2>&1
    if [ $? -ne 0 ]; then
      ERROR=1
    fi
fi

In the case of TOSSH, the symlink is set here at the end. Does not apply to me.

Part 7e

## zazu added [ -z "$ERROR" ], no symlink if error

  if ( [ "$S" ] && [ "$FROMSSH" ] && [ -z "$TOSSH" ] && 
  [ -z "$ERROR" ] ) || ( [ -z "$S" ] ); then

    $ECHO "$LN -nsf $TARGET$TODAY $TARGET$LAST" >> $LOG

    $LN -nsf "$TARGET"$TODAY "$TARGET"$LAST >> $LOG 2>&1
      if [ $? -ne 0 ]; then
        ERROR=1
      fi
  fi

Here the symlink is set in the case of FROMSSH.
First the command is written into the log, then it is executed with $LN, which points to the command „ln“.
If the command prints something, it will be at the end of the log file.
In case of errors ERROR=1 is used for the subject of the mail message.

Added by me -z „$ERROR“:

subpart of part 7e

if ( [ "$S" ] && [ "$FROMSSH" ] && [ -z "$TOSSH" ] && [ -z "$ERROR" ] ) || ( [ -z "$S" ] );  then

No symlink is written if an error took place, that is „last“ will still point to the old complete backup. Otherwise the server makes the complete backup from the beginning in case of an error, which can mean a month transfer time with a big volume to backup.

subpart of part 7 – the closing of the if -else loop

else
  
  $ECHO "Since one month no backup or only with ERROR 
  (todays day of the month and day of last backup have 
   the same number), kept the last backup from day 
   ${TODAY}, didn't overwirte the folder. No backup 
   today, next backup is tomorrow." >> $LOG

  ERROR=1
  fi

Attention:
The „else“ refers to the larger if-loop if today’s day is the same as the day at symlink-last.

If the symlink is not updated there is a larger explanatory error message.

Part 8

$DATE >> $LOG

At the end of the actual process, the full date/time is written to the log again, so you can see how long the backup took and when it finished.

Part 9 – mail a report

if [ -n "$MAILREC" ]; then
  if [ $ERROR ];then
    $MAIL -s "Error Backup $LOG" $MAILREC < $LOG
# zazu all log
   all_logprint "ERROR _ _ _ _ _ _ _ _ _ _ _ \n" 
   "$(< $LOG)"

  else

     $MAIL -s "Backup $LOG" $MAILREC < $LOG
#zazu all log
    all_logprint "$(< $LOG)"
  fi
fi

If MAILREC is not null (-n),
then if ERROR write in the subject „ERROR backup + the name of the log file $LOG“ otherwise just „backup + the name of the log file“.
$LOG contains the name of the log file
< $LOG evaluates the content of the file which is in $LOG

Below this I start the function „all_logprint“ defined at the beginning of the script. Then pass to the echo command of the function the parameters „Error_ _ _\n“ plus an end of line „\n“. The underscores are just so you can see the ERROR text better.
„$(<$LOG)“ gets the content of the file from the file whose name is in the log variable and then writes it to the „all-days-name-of-the-bu-scriptes.sh.log“ file.

Here again the function from the beginning of the script:

supart of part 3

all_logprint() {
    echo -e "\n $*" >> all-days-${0}.log
}

The file „all-days-bu-script-name.sh.log“ is then next to the „bu-script-name.sh.log“.

In the „all-days“ the single backups are listed, only with the errors and the information about runtime start, runtime end, executed commands and possibly outputs via stdout.

With this the script is finished.

size matters – oversized coolers for less power consumption and less noise

Make the script executable

If you want to execute the script it must have execute rights:

chmod u+x $HOME/bin/bu-script-name.sh

„u“ stands for the user, „g“ for group and „a“ for all.

Now you can run the script directly in the folder with the following command:

./bu-script-name.sh

Run the script every day with cron

To run the script every day you need to make an entry into crontab of the linux backup-user that is hosting the script and should not have any sudo-rights.

crontab -e

Run the script every day at 23:00 h. The script is located in a bin folder of the backupuser inside his home directory.

#zazu backup
0 23 * * * /home/backupuser/bin/bu-script-name.sh

The output structure of „monthrotate“

drwxrwxr-x 25 backupuser backupuser 2343 Sep 20 23:10 20
drwxrwxr-x 44 backupuser backupuser 2555 Sep 21 23:10 21
drwxrwxr-x 32 backupuser backupuser 5678 Sep 22 23:10 22
drwxrwxr-x 56 backupuser backupuser 4567 Sep 23 23:10 23
drwxrwxr-x 66 backupuser backupuser 7564 Sep 24 23:10 24
drwxrwxr-x 43 backupuser backupuser 4567 Sep 25 23:10 25
lrwxrwxrwx 78 backupuser backupuser 7564 Sep 25 23:10 last -> /srv/backup_folder/25

The last successful backup was completed at 25th September with the folder named „25“, a symlink was produced to this folder „25“ with the name „last“.

The log output in the folder of the script

/home/backupuser/bin/

bu-script-name.sh
bu-script-name.sh.log
all-days-bu-script-name.sh.log

Why only a „almost perfect“ script?

There is alway space for improvements. Any improvements and comments are wellcome.

Interested in hardware?

How to build a very quiet and environmental friendly 240 TB backup server ->

How to build a very quiet and environmental friendly office server (German) ->

Author:
Thomas Hezel
zazu.berlin 2022
– Version 1.0

Finally the complete script

Link to the script on Bitbucket

#!/bin/bash

# avoidng collisions with other rsync processes

#the minutes passed to the script 21h*60min = 1260 Minuten
minutes=1260

# Make sure no one else is using rsync
pro_on=$(ps aux | grep -c rsync)

# if someone is using rsync
# grep is also producing one entry so -gt 1

while [ $pro_on -gt 1 ]
do
  sleep 60
  minutes=$(($minutes - 1))

  # we are close to the next day no need to start
  if [ $minutes -lt 60 ] ; then
    exit 0
  fi

  pro_on=$(ps aux | grep -c rsync)
done


# Simple backup with rsync
# local-mode, tossh-mode, fromssh-mode

SOURCES=(/Volumes/raid/ )
TARGET="/srv/backup"

# edit or comment with "#"
MONTHROTATE=monthrotate                 # use DD instead of YYMMDD
RSYNCCONF=(--delete --exclude=/Volumes/raid/.DocumentRevisions-V100 --exclude=/Volumes/raid/.TemporaryItems --exclude=/Volumes/raid/.Trashes --exclude=/Volumes/raid/.apdisc)
MAILREC="webmaster@zazu.berlin.please.change"

SSHUSER="backupuser"
FROMSSH="backupuser@999.888.77.66"
#TOSSH="tossh-server"
SSHPORT=2222

### a lot of zazu edits ###

MOUNT="/bin/mount"; FGREP="/bin/fgrep"; SSH="/usr/bin/ssh"
LN="/bin/ln"; ECHO="/bin/echo"; DATE="/bin/date"; RM="/bin/rm"
DPKG="/usr/bin/dpkg"; AWK="/usr/bin/awk"; MAIL="/usr/bin/mail"
CUT="/usr/bin/cut"; TR="/usr/bin/tr"; RSYNC="/usr/bin/rsync"
LAST="last"; INC="--link-dest=$TARGET/$LAST"; LS="/bin/ls"

#zazu log all days in one file
all_logprint() {
    echo -e "\n $*" >> all-days-${0}.log
}

LOG=$0.log
$DATE > $LOG

if [ "${TARGET:${#TARGET}-1:1}" != "/" ]; then
  TARGET=$TARGET/
fi

#zazu check whether today is the same number as the symlink "last" to the numbered folder
#also no 2 backups on one day - gives an ERROR
HEUTE=$($DATE +%d)
cd $TARGET
LAST_SYMLINK=$($LS -l | grep ^l | grep -o "[0-9][0-9]$")
cd /home/backupuser/bin

if [ $HEUTE -ne $LAST_SYMLINK ]; then

  if [ -z "$MONTHROTATE" ]; then
    TODAY=$($DATE +%y%m%d)
  else
    TODAY=$($DATE +%d)
  fi

  if [ "$SSHUSER" ] && [ "$SSHPORT" ]; then
    S="$SSH -p $SSHPORT -l $SSHUSER";
  fi

  for SOURCE in "${SOURCES[@]}"
    do
      if [ "$S" ] && [ "$FROMSSH" ] && [ -z "$TOSSH" ]; then
        $ECHO "$RSYNC -e \"$S\" -avR \"$FROMSSH:$SOURCE\" ${RSYNCCONF[@]} $TARGET$TODAY $INC"  >> $LOG
        #log only errors "2>> $LOG" since logging all files backuped will crash the mailer, original ">> $LOG 2>&1"
        $RSYNC -e "$S" -avR "$FROMSSH:$SOURCE" "${RSYNCCONF[@]}" "$TARGET"$TODAY $INC 2>> $LOG
        if [ $? -ne 0 ]; then
          ERROR=1
        fi
      fi
      if [ "$S" ]  && [ "$TOSSH" ] && [ -z "$FROMSSH" ]; then
        $ECHO "$RSYNC -e \"$S\" -avR \"$SOURCE\" ${RSYNCCONF[@]} \"$TOSSH:$TARGET$TODAY\" $INC " >> $LOG
        $RSYNC -e "$S" -avR "$SOURCE" "${RSYNCCONF[@]}" "$TOSSH:\"$TARGET\"$TODAY" $INC >> $LOG 2>&1
        if [ $? -ne 0 ]; then
          ERROR=1
        fi
      fi
      if [ -z "$S" ]; then
        $ECHO "$RSYNC -avR \"$SOURCE\" ${RSYNCCONF[@]} $TARGET$TODAY $INC"  >> $LOG
        $RSYNC -avR "$SOURCE" "${RSYNCCONF[@]}" "$TARGET"$TODAY $INC  >> $LOG 2>&1
        if [ $? -ne 0 ]; then
          ERROR=1
        fi
      fi
  done

  if [ "$S" ] && [ "$TOSSH" ] && [ -z "$FROMSSH" ]; then
    $ECHO "$SSH -p $SSHPORT -l $SSHUSER $TOSSH $LN -nsf $TARGET$TODAY $TARGET$LAST" >> $LOG
    $SSH -p $SSHPORT -l $SSHUSER $TOSSH "$LN -nsf \"$TARGET\"$TODAY \"$TARGET\"$LAST" >> $LOG 2>&1
    if [ $? -ne 0 ]; then
      ERROR=1
    fi
  fi

  ## zazu added [ -z "$ERROR" ], no symlink if error

  if ( [ "$S" ] && [ "$FROMSSH" ] && [ -z "$TOSSH" ] && [ -z "$ERROR" ] ) || ( [ -z "$S" ] );  then
    $ECHO "$LN -nsf $TARGET$TODAY $TARGET$LAST" >> $LOG
    $LN -nsf "$TARGET"$TODAY "$TARGET"$LAST  >> $LOG 2>&1
    if [ $? -ne 0 ]; then
      ERROR=1
    fi
  fi

else
  $ECHO "Since one month no backup or only with ERROR (todays day of the month and day of last backup have the same number), kept the last backup from day ${TODAY}, didn't overwirte the folder. No backup today, next backup is tomorrow." >> $LOG
  ERROR=1
fi

$DATE >> $LOG

if [ -n "$MAILREC" ]; then
  if [ $ERROR ];then
    $MAIL -s "Error Backup $LOG" $MAILREC < $LOG
    #zazu all log
    all_logprint "ERROR _ _ _ _ _ _ _ _ _ _ _ \n" "$(< $LOG)"
  else
    $MAIL -s "Backup $LOG" $MAILREC < $LOG
    #zazu all log
    all_logprint "$(< $LOG)"
  fi
fi

11 Kommentare

  1. How about generating database backups then pulling them over ssh?

    • Hello xpil,
      if you have a cron-job running that is dumping the database, this script will then treat it like a regular file and include it in the backup.
      Thomas

  2. Benjamin Benjamin

    Could you please provide a link to download the script with the changes implemented in it?

    • Hello Benajamin,
      the complete script is at the end of the blog now, for copy and paste.
      Thomas

  3. Mike Mike

    In part one, where you ‘grep -c rsync’, you could instead ‘grep [r]sync’ which will omit the grep itself from the returned process list. The square braces make a regex… it’s a neat trick. Then you can just check the value of $? to see if the grep exited with a 1 or a 0.

    Thanks for the article.

  4. Janko Janko

    please add „Iwc -l“ here –> LAST_SYMLINK=$($LS -l | grep ^l | grep -o „[0-9][0-9]$“ | wc -l)

    • Hello Janko,
      Using „wc“ word count „-l“ would give the number of new lines and not the text (number) of the last symlink name.
      The names of the symlinks are from „01“ to „31“ these digits we are looking for here.
      Lines would not work.

  5. Jan Jan

    LAST_SYMLINK=$($LS -l | grep ^l | grep -o „[0-9][0-9]$“)
    cd /home/backupuser/bin

    if [ $HEUTE -ne $LAST_SYMLINK ]; then

    This fails when the script is started for the very first time, since there are no prior backups at all.
    Also using a fixed path in the middle of the script is not a good idea, how about replacing it with a variable, initialized at the top of the script?

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert