神刀安全网

PGHoard: Tools for making PostgreSQL backups to cloud object storages

PGHoard is the cloud backup and restore solution we’re using in Aiven . We started PGHoard development in early 2015 when the Aiven project was launched as a way to provide real-time streaming backups of PostgreSQL to a potentially untrusted cloud object storage.

PGHoard has an extensible object storage interface, which currently works with the following cloud object stores:

  • Amazon Web Services S3
  • Google Cloud Storage
  • OpenStack Swift
  • Ceph’s RADOSGW utilizing either the S3 or Swift drivers 
  • Microsoft Azure Storage ( currently experimental )

Data integrity

PostgreSQL backups consist of full database backups, basebackups , plus write ahead logs and related metadata, WAL . Both basebackups and WAL are required to create and restore a consistent database.

PGHoard handles both the full, periodic backups (driving pg_basebackup ) as well as streaming the write-ahead-log of the database.  Constantly streaming WAL as it’s generated allows PGHoard to restore a database to any point in time since the oldest basebackup was taken.  This is used to implement Aiven’s Database Forks and Point-in-time-Recovery as described in ourPostgreSQL FAQ.

To save disk space and reduce the data that needs to be sent over the network (potentially incurring extra costs) backups are compressed by default using Google’s Snappy , a fast compression algorithm with a reasonable compression ratio. LZMA (a slower algorithm with very high compression ratio) is also supported.

To protect backups from unauthorized access and to ensure their integrity PGHoard can also transparently encrypt and authenticate the data using RSA, AES and SHA256.  Each basebackup and WAL segments gets a unique random AES key which is encrypted with RSA.  HMAC-SHA256 is used for file integrity checking.

Restoration is key

As noted in the opening paragraph, PGHoard is a backup and restore tool: backups are largely useless unless they can be restored.  Experience tells us that backups, even if set up at some point, are usually not restorable unless restore is routinely tested, but experience also shows that backup restoration is rarely practiced unless it’s easy to do and automate.

This is why PGHoard also includes tooling to restore backups, allowing you to create new master or standby databases from the object store archives.  This makes it possible to set up a new database replica with a single command, which first restores the database basebackup from object storage and then sets up PostgreSQL’s recovery.conf to fetch the remaining WAL files from the object storage archive and optionally connect to an existing master server after that.

Preparing PostgreSQL for PGHoard

First, we will need to create a replication user account. We’ll just use the psql command-line client for this:

postgres=# CREATE USER backup WITH REPLICATION PASSWORD ‘secret’;

We also need to allow this new user to make connections to the database. In PostgreSQL this is done by editing the pg_hba.conf configuration file and adding a line something like this:

host  replication  backup  127.0.0.1/32  md5

We’ll also need to ensure our PostgreSQL instance is configured to allow WAL replication out from the server and it has the appropriate wal_level setting. We’ll edit postgresql.conf and edit or add the following settings:

max_wal_senders = 2  # minimum two with pg_receivexlog mode!

Finally, since we have modified PostgreSQL configuration files, we’ll need to restart PostgreSQL to take the new settings into use by running " pg_ctl restart ", " systemctl restart postgresql " or " service postgresql restart", etc depending on the Linux distribution being used.  Note that it’s not enough to "reload" PostgreSQL in case the WAL settings were changed.

Now we are ready on the PostgreSQL side and can move on to PGHoard.

Installing PGHoard

PGHoard’s source distribution includes packaging scripts for Debian, Fedora and Ubuntu.  Instructions for building distribution specific packages can be found in the PGHoard README .  As PGHoard is a Python package it can also be installed on any system with Python 3 by running " pip3 install pghoard ".

Taking backups with PGHoard

PGHoard provides a number of tools that can be launched from the command-line:

  • pghoard – The backup daemon itself, can be run under systemd or sysvinit
  • pghoard_restore – Backup restoration tool
  • pghoard_archive_sync – Command for verifying archive integrity
  • pghoard_create_keys – Backup encryption key utility
  • pghoard_postgres_command – Used as PostgreSQL’s archive_command and restore_command

First, we will launch the pghoard daemon to start taking backups. pghoard requires a small JSON configuration file that contains the settings for the PostgreSQL connection and for the target backup storage. We’ll name the file pghoard.json :

{

"backup_location": "./metadata",

"backup_sites": {

"example-site": {

"nodes": [

{

"host": "127.0.0.1",

"password": "secret",

"port": 5432,

"user": "backup"

}

],

"object_storage": {

"storage_type": "local",

"directory": "./backups"

}

}

}

In the above file we just list where pghoard keep’s its local working directory ( backup_location ), our PostgreSQL connection settings ( nodes ) and where we want to store the backups ( object_storage ). In this example we’ll just write the backup files to a local disk instead of a remote cloud object storage.

Then we just need to run the pghoard daemon and point it to our configuration file:

$ pghoard –short-log –config pghoard.json

DEBUG   Loading JSON config from: ‘./pghoard.json’, signal: None

INFO    pghoard initialized, own_hostname: ‘ohmu1’, cwd: ‘/home/mel/backup’

INFO    Creating a new basebackup for ‘example-site’ because there are currently none

INFO    Started: [‘/usr/bin/pg_receivexlog’, ‘–status-interval’, ‘1’, ‘–verbose’, ‘–directory’, ‘./metadata/example-site/xlog_incoming’, ‘–dbname’, "dbname=’replication’ host=’127.0.0.1′ port=’5432′ replication=’true’ user=’backup’"], running as PID: 8809

INFO    Started: [‘/usr/bin/pg_basebackup’, ‘–format’, ‘tar’, ‘–label’, ‘pghoard_base_backup’, ‘–progress’, ‘–verbose’, ‘–dbname’, "dbname=’replication’ host=’127.0.0.1′ port=’5432′ replication=’true’ user=’backup’", ‘–pgdata’, ‘./metadata/example-site/basebackup_incoming/2016-04-28_0’], running as PID: 8815, basebackup_location: ‘./metadata/example-site/basebackup_incoming/2016-04-28_0/base.tar’

INFO    Compressed 16777216 byte file ‘./metadata/example-site/xlog_incoming/000000010000000000000025’ to 805706 bytes (4%), took: 0.056s

INFO    ‘UPLOAD’ transfer of key: ‘example-site/xlog/000000010000000000000025’, size: 805706, took 0.003s

INFO    Ran: [‘/usr/bin/pg_basebackup’, ‘–format’, ‘tar’, ‘–label’, ‘pghoard_base_backup’, ‘–progress’, ‘–verbose’, ‘–dbname’, "dbname=’replication’ host=’127.0.0.1′ port=’5432′ replication=’true’ user=’backup’", ‘–pgdata’, ‘./metadata/example-site/basebackup_incoming/2016-04-28_0’], took: 0.331s to run, returncode: 0

INFO    Compressed 16777216 byte file ‘./metadata/example-site/xlog_incoming/000000010000000000000026’ to 797357 bytes (4%), took: 0.057s

INFO    ‘UPLOAD’ transfer of key: ‘example-site/xlog/000000010000000000000026’, size: 797357, took 0.011s

INFO    Compressed 80187904 byte file ‘./metadata/example-site/basebackup_incoming/2016-04-28_0/base.tar’ to 15981960 bytes (19%), took: 0.335s

‘UPLOAD’ transfer of key: ‘example-site/basebackup/2016-04-28_0’, size: 15981960, took 0.026s

PGHoard automatically connected to the PostgreSQL database server, noticed that we don’t have any backups and immediately created a new basebackup and started the realtime streaming of WAL files (which act as incremental backups). Each file stored in the backups was first compressed for optimizing the transfer and storage costs.

As long as you keep PGHoard running, it will make full backups using the default schedule (once per 24 hours) and continuously stream WAL files.

Looking at the contents of the " backups " directory, we see that our backups now contain a full database backup plus a couple of WAL files, and some metadata for each of the files:

$ find backups/ -type f

backups/example-site/xlog/000000010000000000000025

backups/example-site/xlog/000000010000000000000025.metadata

backups/example-site/xlog/000000010000000000000026

backups/example-site/xlog/000000010000000000000026.metadata

backups/example-site/basebackup/2016-04-28_0

Available backups can be listed with the pghoard_restore tool:

$ pghoard_restore list-basebackups –config pghoard.json

Available ‘example-site’ basebackups:

Basebackup                                Backup size    Orig size  Start time

—————————————-  ———–  ———–  ——————–

Looks like we are all set. Now let’s try restore!

Restoring a backup

Restoring a backup is a matter of running a single command:

$ pghoard_restore get-basebackup –config pghoard.json –target-dir restore-test

Found 1 applicable basebackup

Basebackup                                Backup size    Orig size  Start time

—————————————-  ———–  ———–  ——————–

example-site/basebackup/2016-04-28_0            15 MB        76 MB  2016-04-28T06:40:46Z

metadata: {‘compression-algorithm’: ‘snappy’, ‘start-wal-segment’: ‘000000010000000000000026’, ‘pg-version’: ‘90406’}

Selecting ‘example-site/basebackup/2016-04-28_0’ for restore

Basebackup complete.

You can start PostgreSQL by running pg_ctl -D restore-test start

On systemd based systems you can run systemctl start postgresql

The pghoard_restore command automatically chooses the latest available backup, downloads, unpacks (and decompresses and decrypts, when those options are used) it to the specified target directory. The end result will be a complete PostgreSQL data directory (e.g. something like /var/lib/postgresql/9.5/main or /var/lib/pgsql/data , depending on the distro), ready to be used by a PostgreSQL instance.

There are more command-line options for more detailed control over the restoration process, for example restoring to a particular point in time or transaction (PITR) or choosing whether the restored database will be acting as a master or a standby.

Backup encryption

In order to encrypt our backups, we’ll need to create an encryption key pair. PGHoard provides a handy command for automatically creating a key pair and storing it into our configuration file:

$ pghoard_create_keys –key-id example –config pghoard.json

Saved new key_id ‘example’ for site ‘example-site’ in ‘pghoard.json’

NOTE: The pghoard daemon does not require the ‘private’ key in its configuration file, it can be stored elsewhere to improve security

Note that in most cases you will want to extract the private key away from the configuration file and store it safely elsewhere away from the machine that makes the backups. The pghoard daemon only needs the encryption public key during normal operation. The private key is only required by the restore tool and the daemon while restoring a backup.

Uploading backups to the cloud

Sending backups to an object storage in the cloud is simple: we just need the cloud’s access credentials and we’ll modify the object_storage section pghoard.json :


"object_storage": {

"aws_access_key_id": "XXX",

"aws_secret_access_key": "XXX",

"bucket_name": "backups",

"region": "eu-central-1",

"storage_type": "s3"

Now when we restart pghoard , the backups are sent to AWS S3 in Frankfurt:

$ pghoard –short-log –config pghoard.json

DEBUG   Loading JSON config from: ‘./pghoard.json’, signal: None

INFO    pghoard initialized, own_hostname: ‘ohmu1’, cwd: ‘/home/mel/backup’

INFO    Started: [‘/usr/bin/pg_receivexlog’, ‘–status-interval’, ‘1’, ‘–verbose’, ‘–directory’, ‘./metadata/example-site/xlog_incoming’, ‘–dbname’, "dbname=’replication’ host=’127.0.0.1′ port=’5432′ replication=’true’ user=’backup’"], running as PID: 8001

INFO    Creating a new basebackup for ‘example-site’ because there are currently none

INFO    Started: [‘/usr/bin/pg_basebackup’, ‘–format’, ‘tar’, ‘–label’, ‘pghoard_base_backup’, ‘–progress’, ‘–verbose’, ‘–dbname’, "dbname=’replication’ host=’127.0.0.1′ port=’5432′ replication=’true’ user=’backup’", ‘–pgdata’, ‘./metadata/example-site/basebackup_incoming/2016-04-28_1’], running as PID: 8014, basebackup_location: ‘./metadata/example-site/basebackup_incoming/2016-04-28_1/base.tar’

INFO    Ran: [‘/usr/bin/pg_basebackup’, ‘–format’, ‘tar’, ‘–label’, ‘pghoard_base_backup’, ‘–progress’, ‘–verbose’, ‘–dbname’, "dbname=’replication’ host=’127.0.0.1′ port=’5432′ replication=’true’ user=’backup’", ‘–pgdata’, ‘./metadata/example-site/basebackup_incoming/2016-04-28_1’], took: 0.350s to run, returncode: 0

INFO    Compressed and encrypted 16777216 byte file ‘./metadata/example-site/xlog_incoming/000000010000000000000027’ to 799445 bytes (4%), took: 0.406s

INFO    Compressed and encrypted 16777216 byte file ‘./metadata/example-site/xlog_incoming/000000010000000000000028’ to 797784 bytes (4%), took: 0.137s

INFO    Compressed and encrypted 80187904 byte file ‘./metadata/example-site/basebackup_incoming/2016-04-28_1/base.tar’ to 15982372 bytes (19%), took: 0.417s

INFO    ‘UPLOAD’ transfer of key: ‘example-site/xlog/000000010000000000000028’, size: 797784, took 0.885s INFO     ‘UPLOAD’ transfer of key: ‘example-site/xlog/000000010000000000000027’, size: 799445, took 1.104s

‘UPLOAD’ transfer of key: ‘example-site/basebackup/2016-04-28_1’, size: 15982372, took 4.911s

The restore tool works the same way regardless of where the backups are stored:

$ pghoard_restore list-basebackups –config pghoard.json

Available ‘example-site’ basebackups:

Basebackup                                Backup size    Orig size  Start time

—————————————-  ———–  ———–  ——————–

PostgreSQL 9.2+ and Python 3.3+ required

Today we released PGHoard version 1.2.0 with support for Python 3.3 and PostgreSQL 9.2 plus enhanced support for handling network outages.  These features were driven by external users, in Aiven we always use the latest PostgreSQL versions (9.5.2 at the time of writing) and access object storages near the database machines.

PGHoard: Tools for making PostgreSQL backups to cloud object storages PGHoard in Aiven.io

We’re happy to talk more about PGHoard and help you set up your backups with it.  You can also sign up for a free trial of ourAiven.io PostgreSQL service where PGHoard will take care of your backups.

Cheers,

Team Aiven

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » PGHoard: Tools for making PostgreSQL backups to cloud object storages

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址