Backup/Restore Postgres DB from AWS S3 using WAL-G

WAL-G  is an archival restoration tool for Postgres(beta for MySQL, MariaDB, MongoDB and SQLServer)

It is an continuous arching tool used to easily set up and recover from physical backup in Postgres. It mainly handles the storage and retrieval of physical backups and WAL archives to and from range of various cloud provider like AWS,Azure,GCP etc .

Its has more function over pgbasebackup like, you can use wal-g as a tool for making encrypted, compressed PostgreSQL backups(full and incremental) and push/fetch them to/from storage without saving it on your filesystem.

To set up WAL-G based backup We need

  1. running Postgres sql database which can be backed and restored
  2. AWS S3 storage credential ( s3 bucket , AWS Access & Secret keys.)
  3. envdir which helps to run other programs with a modified environment based on the files in the provided directory
  4. envdir can be installed through the daemontools package:

1.envdir Installation

 mkdir -p /package 
 chmod 1755 /package 
 cd /package 

Download daemontools-0.76.tar.gz into /package
[root@ip-172-31-37-152 package]# wget http://cr.yp.to/daemontools/daemontools-0.76.tar.gz
 --2021-04-09 11:59:03--  http://cr.yp.to/daemontools/daemontools-0.76.tar.gz
 Resolving cr.yp.to (cr.yp.to)… 131.193.32.108, 131.193.32.109

 [root@ip-172-31-37-152 package]# gunzip daemontools-0.76.tar
 [root@ip-172-31-37-152 package]# tar -xpf daemontools-0.76.tar
 [root@ip-172-31-37-152 package]# rm -f daemontools-0.76.tar
 [root@ip-172-31-37-152 package]#
 [root@ip-172-31-37-152 package]# cd admin/daemontools-0.76

The compilation was failing with 

Issue 1
 [root@ip-172-31-37-152 daemontools-0.76]# package/install
 Linking ./src/* into ./compile…
 Compiling everything in ./compile…
 rm -f compile
 sh print-cc.sh > compile
 chmod 555 compile
 ./compile byte_chr.c
 ./compile: line 3: exec: gcc: not found  
 make: *** [byte_chr.o] Error 127

Solution download  for gcc and lib package 
[root@ip-172-31-37-152 package]# yum install gcc
Package gcc-7.3.1-12.amzn2.x86_64

Issue 2
  [root@ip-172-31-37-152 daemontools-0.76]# package/install 
/usr/bin/ld: errno: TLS definition in /lib64/libc.so.6 section .tbss mismatches non-TLS reference in envdir.o
 /lib64/libc.so.6: error adding symbols: Bad value
 collect2: error: ld returned 1 exit status
 make: *** [envdir] Error 1

This is know bug ..just ads these at the end of the lines -include /usr/include/errno.h in src/conf-cc files 

[root@ip-172-31-37-152 daemontools-0.76]# cat src/conf-cc
gcc -O2 -Wimplicit -Wunused -Wcomment -Wchar-subscripts -Wuninitialized -Wshadow -Wcast-qual -Wcast-align -Wwrite-strings  -include /usr/include/errno.h

After this try to compile , it will be complied successfully.

[root@ip-172-31-37-152 daemontools-0.76]# package/install
 Linking ./src/* into ./compile…
 Compiling everything in ./compile…
 Creating /service…
 Adding svscanboot to inittab…
 init should start svscan now. 

Check the envdir 

[root@ip-172-31-37-152 daemontools-0.76]# which envdir
 /usr/local/bin/envdir



2. Install WAL-G from github link

[root@ip-172-31-37-152 ~]# wget https://github.com/wal-g/wal-g/releases/download/v0.2.15/wal-g.linux-amd64.tar.gz

--2021-04-08 21:15:02--  https://github.com/wal-g/wal-g/releases/download/v0.2.15/wal-g.linux-amd64.tar.gz
Resolving github.com (github.com)… 13.234.176.102
100%[===================================================================================================================>] 11,917,016  11.1MB/s   in 1.0s
 2021-04-08 21:15:03 (11.1 MB/s) - ‘wal-g.linux-amd64.tar.gz’ saved [11917016/11917016]


Untar the downloaded file 

[root@ip-172-31-37-152 ~]# tar -zxvf wal-g.linux-amd64.tar.gz
 wal-g

Move to /usr/local/bin

[root@ip-172-31-37-152 ~]# mv wal-g /usr/local/bin/

Verify , we have downloaded the correct version

-bash-4.2$ /usr/local/bin/wal-g --version
wal-g version v0.2.15   f6abd0c 2020.03.24_11:58:03     PostgreSQL


Check all commnds supported using WAL-G 

Usage:
   wal-g [command]
   Available Commands:
   backup-fetch  Fetches a backup from storage
   backup-list   Prints available backups
   backup-mark   Marks a backup permanent or impermanent
   backup-push   Makes backup and uploads it to storage
   catchup-fetch Fetches an incremental backup from storage
   catchup-push  Creates incremental backup from lsn
   delete        Clears old backups and WALs
   help          Help about any command
   wal-fetch     Fetches a WAL file from storage
   wal-push      Uploads a WAL file to storage
   Flags:
       --config string   config file (default is $HOME/.walg.json)
   -h, --help            help for wal-g
       --version         version for wal-g
  Use "wal-g [command] --help" for more information about a command.

3. Set up environment for WAL-G storage

Create directory  /etc/wal-g.d/env  , it will have files which will stores environment variable and it can be used with envdir 

 [root@ip-172-31-37-152 ~]# umask u=rwx,g=rx,o=
 [root@ip-172-31-37-152 ~]# mkdir -p /etc/wal-g.d/env
 [root@ip-172-31-37-152 ~]# cd /etc/wal-g.d/env
 echo ' your aws secret key' >     /etc/wal-g.d/env/AWS_SECRET_ACCESS_KEY
 echo 'aws access key' >           /etc/wal-g.d/env/AWS_ACCESS_KEY_ID
 echo 's3://your s3 bucket/folder'> /etc/wal-g.d/env/WALG_S3_PREFIX
 echo 'db password' > /etc/wal-g.d/env/PGPASSWORD  
 My s3 bucket looks like 
echo 's3://parwezpostgresbkp' > /etc/wal-g.d/env/WALG_S3_PREFIX


 [root@ip-172-31-37-152 env]# chown -R root:postgres /etc/wal-g.d

 [root@ip-172-31-37-152 ~]# cd /etc/wal-g.d/env
 [root@ip-172-31-37-152 env]#
 [root@ip-172-31-37-152 env]# ls -ltr
 total 16
 -rw-r----- 1 root postgres 41 Apr  8 21:33 AWS_SECRET_ACCESS_KEY
 -rw-r----- 1 root postgres 21 Apr  8 21:33 AWS_ACCESS_KEY_ID
 -rw-r----- 1 root postgres 23 Apr  8 21:33 WALG_S3_PREFIX
 -rw-r----- 1 root postgres 13 Apr  8 21:33 PGPASSWORD


4. Enable wal archive , if not already archived

echo "archive_command = 'envdir /etc/wal-g.d/env /usr/local/bin/wal-g wal-push %p'" >> /opt/PostgreSQL/10/data/postgresql.conf
echo "archive_timeout = 60" >>  /opt/PostgreSQL/10/data/postgresql.conf
echo "archive_mode = on" >>  /opt/PostgreSQL/10/data/postgresql.conf 

-bash-4.2$ grep -i archive postgresql.conf
 archive_mode = on               # enables archiving; off, on, or always
archive_command = 'envdir /etc/wal-g.d/env /usr/local/bin/wal-g wal-push %p'
 archive_timeout = 60

5. Restart the database to make the changes effective

-bash-4.2$  pg_ctl -D /opt/PostgreSQL/10/data/ restart

waiting for server to shut down…. done
server stopped
waiting for server to start….2021-04-09 18:01:25.644 UTC [3684] LOG:   listening on IPv4 address "0.0.0.0", port 5432
2021-04-09 18:01:25.645 UTC [3684] LOG:  listening on IPv6 address "::", port 5432
2021-04-09 18:01:25.650 UTC [3684] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-04-09 18:01:25.663 UTC [3684] LOG:  redirecting log output to logging collector process
2021-04-09 18:01:25.663 UTC [3684] HINT:  Future log output will appear in directory "log".
done
server started

5. Now everything is in place, go ahead and create your first full physicals backup to Amazon s3 bucket

-bash-4.2$ envdir /etc/wal-g.d/env /usr/local/bin/wal-g backup-push /opt/PostgreSQL/10/data/


INFO: 2021/04/09 18:02:19.017277 Doing full backup.
INFO: 2021/04/09 18:02:19.029925 Calling pg_start_backup()
INFO: 2021/04/09 18:02:19.175800 Walking …
INFO: 2021/04/09 18:02:19.175970 Starting part 1 …
INFO: 2021/04/09 18:02:20.687353 Finished writing part 1.
INFO: 2021/04/09 18:02:21.027570 Starting part 2 …
INFO: 2021/04/09 18:02:21.027692 /global/pg_control
INFO: 2021/04/09 18:02:21.030161 Finished writing part 2.
INFO: 2021/04/09 18:02:21.030839 Calling pg_stop_backup()
INFO: 2021/04/09 18:02:23.075693 Starting part 3 …
INFO: 2021/04/09 18:02:23.078105 backup_label
INFO: 2021/04/09 18:02:23.078234 tablespace_map
INFO: 2021/04/09 18:02:23.078310 Finished writing part 3.
INFO: Wrote backup with name  base_000000030000000000000024

Check if backup available on aws s3 bucket 
-bash-4.2$ envdir /etc/wal-g.d/env /usr/local/bin/wal-g backup-list
 name                          last_modified        wal_segment_backup_start
 base_000000030000000000000024 2021-04-09T18:02:24Z 000000030000000000000024

Or go to your s3 bucket to verify the same 
There will be one basebackup folder for all physicals backup and other folder will be there for WAL logs 

6.Now make some changes , in the database

-bash-4.2$ psql
 Password:
 psql.bin (10.13)
 Type "help" for help.
 postgres=#
 postgres=#
 postgres=# show data_directory ;
      data_directory
 /opt/PostgreSQL/10/data
 (1 row)

postgres=# \c pitr_demo
You are now connected to database "pitr_demo" as user "postgres".

 pitr_demo=# \dt
           List of relations
  Schema |   Name   | Type  |  Owner
 --------+----------+-------+----------
  public | bins     | table | postgres
  public | bins_new | table | postgres
  public | pitr     | table | postgres
  public | t1       | table | postgres
 (4 rows)
 pitr_demo=# CREATE TABLE aws_recovry  AS  SELECT * FROM GENERATE_SERIES(1, 10) AS id;
 SELECT 10

pitr_demo=# checkpoint ;
 CHECKPOINT

7. Create another backup , which will backup the new changes

-bash-4.2$ envdir /etc/wal-g.d/env /usr/local/bin/wal-g backup-push /opt/PostgreSQL/10/data/
 INFO: 2021/04/09 18:06:02.485687 Doing full backup.
 INFO: 2021/04/09 18:06:02.493826 Calling pg_start_backup()
 INFO: 2021/04/09 18:06:02.639368 Walking …
 INFO: 2021/04/09 18:06:02.639560 Starting part 1 …
 INFO: 2021/04/09 18:06:03.183496 Finished writing part 1.
 INFO: 2021/04/09 18:06:03.469442 Starting part 2 …
 INFO: 2021/04/09 18:06:03.469467 /global/pg_control
 INFO: 2021/04/09 18:06:03.471489 Finished writing part 2.
 INFO: 2021/04/09 18:06:03.472186 Calling pg_stop_backup()
 INFO: 2021/04/09 18:06:05.518580 Starting part 3 …
 INFO: 2021/04/09 18:06:05.520622 backup_label
 INFO: 2021/04/09 18:06:05.520684 tablespace_map
 INFO: 2021/04/09 18:06:05.520743 Finished writing part 3.
 INFO:  Wrote backup with name base_000000030000000000000027

Now since we have both base and wal backup , we will be going to restore this backup to another postgres sql install , which is running on the same server using diffrent port 5433 and data_dir path. You can even test this on different server or if somehow the same database got deleted

Since , I am restoring on the same server , I will be using the same environment variable which were created in /etc/wal-g.d/env. If you are restoring on different server , you need to create these env variable with similar credentials as we had done while creating all files in /etc/wal-g.d/env.

I will be restoring the database backup to postgresql db with data directory as /opt/newclusterdb/.

Remove all files in the /opt/newclusterdb/

-bash-4.2$ cd /opt/newclusterdb/

-bash-4.2$ ls -ltr
 total 76
 drwx------ 2 postgres postgres     6 Apr  9 14:03 tablespace_exp
 drwx------ 2 postgres postgres    19 Apr  9 14:03 pg_tblspc
 -rw------- 1 postgres postgres     0 Apr  9 14:03 tablespace_map

-bash-4.2$ rm -rf *
-bash-4.2$ ls -lrt
 total 0
 -bash-4.2$ pwd
/opt/newclusterdb

Since I am running posgres10 , I will be using recovery.conf file for point in time restore , for postgres12 onwards these recovery.conf files doesn’t supported , all restore_command and recovery_target are specified in the main postgresql.conf.

8. Create recovery.conf

-bash-4.2$ cat recovery.conf
restore_command = '/usr/bin/envdir /etc/wal-g.d/env /usr/local/bin/wal-g  wal-fetch "%f" "%p" >> /tmp/wal.log 2>&1'
 
recovery_target_time = '2021-04-09 18:06:04'

Now restore the physicals backup via envdir

Make sure all files are deleted from data_dir 
-bash-4.2$ rm -rf   /opt/newclusterdb/

Restore physicals files using backup-fetch

-bash-4.2$ envdir /etc/wal-g.d/env /usr/local/bin/wal-g backup-fetch /opt/newclusterdb LATEST


INFO: 2021/04/09 18:17:14.600147 LATEST backup is: 'base_000000030000000000000027'
INFO: 2021/04/09 18:17:14.758900 Finished decompression of  part_003.tar.lz4
INFO: 2021/04/09 18:17:14.758913 Finished extraction of part_003.tar.lz4
INFO: 2021/04/09 18:17:17.712228 Finished decompression of part_001.tar.lz4
INFO: 2021/04/09 18:17:17.712342 Finished extraction of part_001.tar.lz4
INFO: 2021/04/09 18:17:17.731704 Finished decompression of pg_control.tar.lz4
INFO: 2021/04/09 18:17:17.731768 Finished extraction of pg_control.tar.lz4
INFO: 2021/04/09 18:17:17.731814
Backup extraction complete.

9. Check the data_dir where we have restored , if all files are restored or not ?

-bash-4.2$ ls -ltr
 total 64
 -rw------- 1 postgres postgres     0 Apr  9 18:17 tablespace_map
 -rw------- 1 postgres postgres   234 Apr  9 18:17 backup_label
 -rw------- 1 postgres postgres     3 Apr  9 18:17 PG_VERSION
 -rw------- 1 postgres postgres   208 Apr  9 18:17 backup_label.old
 drwx------ 9 postgres postgres    93 Apr  9 18:17 base
 -rw------- 1 postgres postgres    44 Apr  9 18:17 current_logfiles
 -rw------- 1 postgres postgres  1636 Apr  9 18:17 pg_ident.conf
 -rw------- 1 postgres postgres  4398 Apr  9 18:17 pg_hba.conf
 drwx------ 2 postgres postgres     6 Apr  9 18:17 pg_dynshmem
 drwx------ 2 postgres postgres     6 Apr  9 18:17 pg_commit_ts
 drwxr-xr-x 2 postgres postgres     6 Apr  9 18:17 log
 drwx------ 2 postgres postgres    18 Apr  9 18:17 pg_xact
 drwx------ 2 postgres postgres     6 Apr  9 18:17 pg_wal
 drwx------ 2 postgres postgres     6 Apr  9 18:17 pg_twophase
 drwx------ 2 postgres postgres     6 Apr  9 18:17 pg_tblspc
 drwx------ 2 postgres postgres     6 Apr  9 18:17 pg_subtrans
 drwx------ 2 postgres postgres     6 Apr  9 18:17 pg_stat_tmp
 drwx------ 2 postgres postgres     6 Apr  9 18:17 pg_stat
 drwx------ 2 postgres postgres     6 Apr  9 18:17 pg_snapshots
 drwx------ 2 postgres postgres     6 Apr  9 18:17 pg_serial
 drwx------ 2 postgres postgres     6 Apr  9 18:17 pg_replslot
 drwx------ 2 postgres postgres     6 Apr  9 18:17 pg_notify
 drwx------ 4 postgres postgres    36 Apr  9 18:17 pg_multixact
 drwx------ 4 postgres postgres    68 Apr  9 18:17 pg_logical
 -rw-rw-r-- 1 postgres postgres   109 Apr  9 18:17 recovery.done
 -rw-rw-r-- 1 postgres postgres 23197 Apr  9 18:17 postgresql.conf
 -rw------- 1 postgres postgres   129 Apr  9 18:17 postgresql.auto.conf
 -rw------- 1 postgres postgres     0 Apr  9 18:17 tablespace_map.old
 drwx------ 2 postgres postgres  4096 Apr  9 18:17 global

10. Now using recovery.conf ( copy the recovery.conf to data_dir dorectory file try to start the postgresql database it will do the necessary recovery

-bash-4.2$ pg_ctl -D /opt/newclusterdb  start
 waiting for server to start…. done
 server started

Check the logs

-bash-4.2$ cat postgresql-2021-04-09_191448.log
2021-04-09 19:14:48.566 UTC [4499] LOG:  database system was interrupted; last known up at 2021-04-09 18:06:02 UTC
2021-04-09 19:14:48.586 UTC [4499] LOG:  starting point-in-time recovery to 2021-04-09 18:06:04+00
2021-04-09 19:14:48.589 UTC [4499] LOG:  restored log file "00000003.history" from archive

Now , check the database if the table which were created exits

-bash-4.2$ psql -p 5433
 Password:
 psql.bin (10.13)
 Type "help" for help.
 postgres=#
 postgres=#
 postgres=# show data_directory ;
      data_directory
 /opt/newclusterdb 
 (1 row)
postgres=# \c pitr_demo
 You are now connected to database "pitr_demo" as user "postgres".
pitr_demo=# select count(*) from aws_recovry ;
  count
 10
 (1 row)

 721 total views,  2 views today

Leave a Reply

Your email address will not be published. Required fields are marked *