A simple Python script for backing up a PostgreSQL database and uploading it to Amazon S3
Here’s a very simple Python script I currently use to create a compressed PostgreSQL database backup for my Django app. Since my database is very small and I don’t see it becoming big anytime soon, I create the backup locally and send a copy to Amazon S3.
To keep it very simple, I have it set to do hourly backups for 24 hours and daily backups for 1 year (365 days). For the hourly backups, I simply append the letter ‘h’ plus the hour in the filename so files over 24 hours old will automatically get overwritten. Same process for the daily backups, I just append the letter ‘d’ plus the day of the year in the filename and files older than 365 days will be overwritten. I originally wanted to keep the daily and hourly backups in separate folders but decided to just put them all in one folder for simplicity as I can identify from the filename which is hourly or daily.
The script takes an argument: ‘hourly’ or ‘daily’. I just run the command with crontab passing the appropriate argument based on schedule.
To run the script you will need to install boto (pip install boto). Just change the constants at the beginning of the script to match your setup.
import os import sys import subprocess from optparse import OptionParser from datetime import datetime import boto from boto.s3.key import Key DB_USER = 'databaseuser' DB_NAME = 'databasename' BACKUP_PATH = r'/webapps/myapp/db_backups' FILENAME_PREFIX = 'myapp.backup' # Amazon S3 settings. AWS_ACCESS_KEY_ID = os.environ['AWS_ACCESS_KEY_ID'] AWS_SECRET_ACCESS_KEY = os.environ['AWS_SECRET_ACCESS_KEY'] AWS_BUCKET_NAME = 'myapp-db-backups' def main(): parser = OptionParser() parser.add_option('-t', '--type', dest='backup_type', help="Specify either 'hourly' or 'daily'.") now = datetime.now() filename = None (options, args) = parser.parse_args() if options.backup_type == 'hourly': hour = str(now.hour).zfill(2) filename = '%s.h%s' % (FILENAME_PREFIX, hour) elif options.backup_type == 'daily': day_of_year = str(now.timetuple().tm_yday).zfill(3) filename = '%s.d%s' % (FILENAME_PREFIX, day_of_year) else: parser.error('Invalid argument.') sys.exit(1) destination = r'%s/%s' % (BACKUP_PATH, filename) print 'Backing up %s database to %s' % (DB_NAME, destination) ps = subprocess.Popen( ['pg_dump', '-U', DB_USER, '-Fc', DB_NAME, '-f', destination], stdout=subprocess.PIPE ) output = ps.communicate()[0] for line in output.splitlines(): print line print 'Uploading %s to Amazon S3...' % filename upload_to_s3(destination, filename) def upload_to_s3(source_path, destination_filename): """ Upload a file to an AWS S3 bucket. """ conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) bucket = conn.get_bucket(AWS_BUCKET_NAME) k = Key(bucket) k.key = destination_filename k.set_contents_from_filename(source_path) if __name__ == '__main__': main()
Some things you might want to add to the script are catching exceptions and sending out an email alert when an error occurs. I’ve had it running for about 10 days now and it’s been working very well so far.
Tags: devops, tech, database, postgresql, aws