Backup your Postgres database like a pro

Periodically backup your Postgres database from your EC2 Instance to S3 Bucket using python

Backup your Postgres database like a pro

Nowadays, our applications are hosted in the cloud using services like AWS and DigitalOcean. Instead of going for a database-as-a-service, where everything is managed and controlled by the service provider, we might choose to do the opposite i.e. using our own cloud server instance due to cost or several other constraints, since everything is done our way, let's not forget to do the most crucial part as well i.e periodic database backups of your client data. So in this post, we will discuss how we could backup our Postgres database hosted in AWS EC2 and store the database dumps in the AWS S3.

Note: I am assuming you already have a working AWS account with access permission to EC2 Instance & S3 Bucket. I'm also assuming that you have hosted your Postgres database in one of your EC2 Instances.

Creating S3 bucket and policy

To create the bucket and policy, log into the AWS console and go to S3.

  • In the Create Bucket section of the page, add a name for your database backup. You can use something simple like my-db-backup or something more descriptive like mydatabasebackup.

  • Under Permissions, make sure that both of these are checked: Access logs (turned on by default) and Website endpoint (also turned on by default).

  • Click Create at the bottom of this page to finish creating your bucket!

Setting up the roles and permissions

Create a new user with the permission to only upload to the bucket we just created in the step above, for that follow the instructions given below:

  • Inside the AWS console, search for IAM and click on Roles
  • Next, click on the create role button on the top right corner of the console and select S3 under Use case from the list of dropdowns.
  • Again, click on the create policy button and then click on the JSON tab and paste the code given below:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["s3:PutObject"],
      "Resource": "arn:aws:s3:::<bucket>/<object>/*"
    }
  ]
}

NOTE: Replace <bucket> with the bucket you created for your database backups, example: my-db-backup, and then replace <object> with the folder you would like those dumps to be uploaded on, example: backups

  • Skip to the next part by clicking on the Next buttons until you reach a page named Review Policy and give your policy a name. If you messed up the process in between you could follow this article otherwise, skip to the next step.
  • Save the ACCESS KEY & SECRET KEY provided for the user you just created in the step above

Installing dependencies

This article is based on a repository hosted on github, please clone and check out the script and requirements in the requirements.txt file.

Since we are using python for backing up our database, we will be needing to install dependencies like boto (AWS SDK for python) inside our EC2 Instance. For that follow the steps given below:

  • SSH to your EC2 Instance using this command: ssh -i path/to/AccessKey.pem ubuntu@ec2-12-34-567-890.compute-1.amazonaws.com
  • Install python dependencies using pip3 install -r requirements.txt
  • Rename .sample.env to .env and add the following credentials
S3_BUCKET=<bucket>
DATABASE_USERNAME=<user>
AWS_ACCESS_KEY_ID=<access-key>
AWS_SECRET_ACCESS_KEY=<secret-key>

Running script manually

After completing all the steps defined above, you finally be able to run the actual script. Go ahead and run the script by going inside the root of this cloned project:

python3 run.py

Running script like a pro

The full purpose to write this well-guided article (lol) is to do this thing like a professional like it was supposed to. There are numerous ways to do it, it's all fine if you want to go the extra mile, like using services like Airflow, Github Actions, PipeDream, etc but we will be using cronjobs this time.

  • Run the following command to set up the cron job:

    sudo service cron start (if not running)
    crontab -e
    
  • Now paste this cron to run the script every night at 02:00 AM NEPALI TIME (UTC+5:45)

    0 22 * * * python3 /home/ubuntu/backup-my-postgres/run.py
    

NOTE: I'm using my native timezone, please change that to your respective timezone if you care about when and at what time it runs a backup.

That's it. No further steps. That is how you run your backup script.


The main motivation behind writing this article is to help another internet fellow who might get overwhelmed by libraries like minio, s3cmd, and awscli. Those tools could deliver greater results but needs various bells & whistles to configure. I also tried to pull this guide using bash but it didn't feel like home, hence the python.

I hope you guys found this article useful. Please don't share or recommend this article to your parents or relatives who might go wtf, why me? but to those who might actually need it & will remember you for this article their whole life. Drop a star in my repo (it's free) if you feel like it ;).

Written with ❤️ in Nepal