SILVA Backup Routine

TL;DR;

Backup scheduled to run everyday at 12:00 AM, through cronjob (0 0 * * *);
Backup verification scheduled to run everyday at 12:30 AM, through cronjob (0 0 31 2 *);
Each namespace (DEV, TEST and PROD) will have its own backup file;
There will be a new backup before every deployment to TEST and PROD, since deployments triggers Flyway and apply migrations.
Files are being saved inside the namespace on a VPC;
In the near future, backup will be saved on S3;

Variables

BACKUP_DIR has value /tmp/backup
RESTORE_DIR has value /tmp/restore
NUM_BACKUPS (Number of backup files to be retained) has value 5
JOB_SERVICE_ACCOUNT has value default but it's required to be informed as a parameter!
SUCCESS_JOBS_HISTORY_LIMIT (Number of successful jobs that will be retained) has value 5
FAILED_JOBS_HISTORY_LIMIT (Number of failed jobs that will be retained) hsa value 2
JOB_BACKOFF_LIMIT (Number of attempts to try for a successful job outcome) has value 0
PVC_SIZE has value 256Mi

Cron job schedules

Backup Schedule: 0 0 * * * (Every day at 12:00 AM)
Restore Schedule: 0 0 31 2 * (Every day at 12:00 AM)

How it works

In a nutshell:

Runs the Postgres pg_dump tool passing all the arguments in the command line
Removes old files based in the amount of backups that should be retained
Copy the latest saved backup SQL file for the restore dir, for easily access.

Example command: pg_dump
-U ${POSTGRESQL_USER}
-h ${NAME}-${ZONE}-database
-d ${POSTGRESQL_DATABASE}
--data-only
--schema=nr-spar
--inserts
--no-comments
--on-conflict-do-nothing
--no-sync
--exclude-table=table1
--exclude-table=table2
--file=${BACKUP_DIR}/backup_$(date +%Y-%m-%d).sql
&& find "${BACKUP_DIR}" -type f -mtime +$NUM_BACKUPS -exec rm -f {} ; && cp -r ${BACKUP_DIR}/backup_$(date +%Y-%m-%d).sql ${RESTORE_DIR}/W0__restore.sql`

Notes:

--data-only: Dump only the data, not the schema (data definitions). Table data, large objects, and sequence values are dumped.
--inserts: we use insert because it's more readable and easy to understand.
--no-comments: discard comments.
--on-conflict-do-nothing: Add ON CONFLICT DO NOTHING option to INSERT.
--no-sync: makes the dump assync !?
--exclude-table=table1: exclude tables you don't need
--file: saves into a file named, for example, /tmp/backup/backup_2024-10-01.sql

Manual handling, if required

If you need to copy the backup file for any reason, here's how you can do it.

echo "Creating backup folder" mkdir -p backups

echo "Waiting for Pod to Start"

oc wait --for=condition=Ready pod/nr-forest-client-782-database-pvc-backup --timeout=300s

echo "Copying backup file"

oc cp nr-forest-client-782-database-pvc-backup:/tmp/backup/backup_2024-02-05.sql ./backups/backup_2024-02-05.sql

oc cp nr-forest-client-782-database-pvc-backup:/tmp/backup/backup_$(date +%Y-%m-%d).sql ./backups/backup_$(date +%Y-%m-%d).sql

echo "Deleting backup pod"

oc process -f backup.yml | oc delete -f -

oc delete po --field-selector=status.phase==Succeeded

References

PG Dump documentation: https://www.postgresql.org/docs/current/app-pgdump.html SPAR OpenShift Backup yml file: https://github.com/bcgov/nr-spar/blob/main/common/openshift.backup.yml If you have questions, reach out to Paulo Cruz.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly