Automated backups in under 30mins

Technical March 16, 2013

Especially when you have nifty tools like the Backup gem readily available.

It took us under 30mins to get started with the Backup gem from scratch and to configure it to do the following:

  1. Every hour, take an SQL dump of our Postgres DB and save it to Amazon S3 while keeping last 7 days worth of backups.
  2. Every hour, sync the images folder (user uploaded images) to a bucket in Amazon S3

Here’s what’s missing from our setup:

  1. To ensure against something (or someone) accidentally deleting images, we should probably setup a way to take fresh backups (not sync) of the images folder every week. Or modify the sync behaviour to only upload new images, not delete them.
  2. A way to restore backups automatically — while the Backup gems does a brilliant job of automating your backups, it doesn’t have any support for automated restores.

Even though we haven’t suffered any data loss in production, we’ve already used the backups to construct the production environment locally — to test a few complicated DB migration scripts before rolling them out to production.

If you aren’t already backing up your data, get started with the Backup gem now! It even generates configuration file templates for you through its generator scripts! It’ll might even take less than 30minutes!

Here’s how we’ve set it up…

The Setup

Here’s a quick illustration of how the Backup gem works at a conceptual level. This may not be a true representation of how the data actually flows within the system.

Conceptual illustration of the Backup gem

You want to periodically get data from your data sources into your backup storage. Along the way you may want to process your data by one or more “steps” — compressing, encrypting, splitting into smaller chunks (to ease the process of transferring large backup files), or cycling (retain backups for the last 3 months).

Let’s start with the entries that landed up on the crontab:

# m h  dom mon dow   command
0 */1 * * * /bin/bash -l -c 'rvm gemset use b2b && cd /home/nanda/b2b/current && backup perform -r lib/backup/ -t db_backup' >> /tmp/backup.log 2>&1
0 */1 * * * /bin/bash -l -c 'rvm gemset use b2b && cd /home/nanda/b2b/current && backup perform -r lib/backup/ -t image_backup' >> /tmp/backup.log 2>&1

The db_backup and image_backup scripts (“models” for the Backup gem for some strange reason) are set to run every hour. (Side rant: Why can’t cron ship with sensible defaults for the shell it fires up? It took me 10 minutes – out of the total 30 minutes – fiddling with the various bash and rvm settings to get this running properly!)

Database backup

Here’s our config file for backing up the production Postgres DB as an SQL dump (the data source) to Amazon S3 (the data storage), while retaining 168 previous backups (cycling), compressing the SQL dump with bzip2, and splitting it into chunks of 250MB if the size crosses that limit. Finally, we’ve set-up up email notifications for letting us know in case something goes wrong while performing a backup.

Backup::Model.new(:db_backup, 'Main database backup') do
  # Split the backup file in to chunks of 250 megabytes
  # if the backup file size exceeds 250 megabytes
  split_into_chunks_of 250

  ##
  # PostgreSQL [Database]
  #
  database PostgreSQL do |db|
    db.name               = "name"
    db.username           = "username"
    db.password           = "password"
    db.host               = "localhost"
    db.port               = 5432
    # In case you want to skip some tables or backup only few tables
    # db.skip_tables        = ["skip", "these", "tables"]
    # db.only_tables        = ["only", "these" "tables"]
    db.additional_options = ["-xc", "-E=utf8"]
  end

  ##
  # Amazon Simple Storage Service [Storage]
  # The Amazon S3 are picked up from the global config if not specifically mentioned here
  store_with S3 do |s3|
    s3.path              = "/"
    # The cron job runs this backup job every hour. The following line asks 
    # the Backup gem to retain the last 168 backups, i.e. 7 days worth of hourly backups
    s3.keep              = 168
  end

  ##
  # Bzip2 [Compressor]
  #
  compress_with Bzip2

  # Notify me by email if anything goes wrong...
  notify_by Mail do |mail|
    mail.on_success = false
    mail.on_warning = false
    mail.on_failure = true
  end
end

Image backup via sync

Here’s how we’re syncing all the images uploaded by our users (photos linked to various trips) to Amazon S3. Note, this syncs the public/system/images directory with a bucket in S3 instead of blindly uploading all the images each time the backup is performed (it would be really expensive to upload potentially many GBs of images every hour, or even every day). Only new photos that have been added to the directory since the last backup are uploaded. What this also means is that, if any photos that have been deleted from the directory are also deleted from S3 (potentially dangerous).

Backup::Model.new(:image_sync, 'Image sync for public/system/images/') do
  ##
  # Amazon S3 [Syncer]
  # Mirroring:
  #
  #   When enabled it will keep an exact mirror of your filesystem on S3.
  #   This means that when you remove a file from the filesystem,
  #   it will also remote it from S3.
  #
  # Concurrency:
  #
  #
  #
  sync_with Cloud::S3 do |s3|
    s3.access_key_id     = "access ID"
    s3.secret_access_key = "secret key"
    s3.region            = "ap-southeast-1"
    s3.bucket            = "bucket-name"
    s3.path              = "/image_backup"
    s3.mirror            = true

    #   `concurrency_type` may be set to:
    #     - false (default)
    #     - :threads
    #     - :processes
    s3.concurrency_type  = false

    #   Set `concurrency_level` to the number of threads/processes to use.
    #   Defaults to 2.
    s3.concurrency_level = 2

    s3.directories do |directory|
      directory.add "public/system/images/"
    end
  end

  notify_by Mail do |mail|
    mail.on_success = false
    mail.on_warning = false
    mail.on_failure = true
  end
end

Common settings

Finally, here’s the common settings file for defining things that are common across various backups — email settings, AWS credentials, GPG credentials, etc.

##
# Global Configuration
# Add more (or remove) global configuration below
Backup::Storage::S3.defaults do |s3|
    s3.access_key_id     = "access key"
    s3.secret_access_key = "secret key"
    s3.region            = "ap-southeast-1"
    s3.bucket            = "bucket name"
    s3.keep      = 168
end

Backup::Notifier::Mail.defaults do |mail|
  mail.from                 = 'Backup Error '
  mail.to                   = 'saurabh@www.vacationlabs.com'
  mail.address              = 'smtp.gmail.com'
  mail.port                 = 587
  mail.domain               = 'www.vacationlabs.com'
  mail.user_name            = 'services@www.vacationlabs.com'
  mail.password             = 'password'
  mail.authentication       = 'plain'
  mail.enable_starttls_auto = true
end

##
# Load all models from the models directory (after the above global configuration blocks)
Dir[File.join(File.dirname(Config.config_file), "models", "*.rb")].each do |model|
  instance_eval(File.read(model))
end

Set-up dynamic pricing for your tours right away

Sign up today and get a free 14-day trial!