Skip to content

Commit

Permalink
cronjob-etcd-backup - init add (#83)
Browse files Browse the repository at this point in the history
  • Loading branch information
itewk authored May 4, 2022
1 parent d8e5c56 commit 82659b3
Show file tree
Hide file tree
Showing 12 changed files with 347 additions and 0 deletions.
23 changes: 23 additions & 0 deletions charts/cronjob-etcd-backup/.helmignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Patterns to ignore when building packages.
# This supports shell glob matching, relative path matching, and
# negation (prefixed with !). Only one pattern per line.
.DS_Store
# Common VCS dirs
.git/
.gitignore
.bzr/
.bzrignore
.hg/
.hgignore
.svn/
# Common backup files
*.swp
*.bak
*.tmp
*.orig
*~
# Various IDEs
.project
.idea/
*.tmproj
.vscode/
5 changes: 5 additions & 0 deletions charts/cronjob-etcd-backup/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: v2
name: cronjob-etcd-backup
description: Deploys a CronJob for creating automated backups of ETCD and storing them on a PersistentVolume
type: application
version: 1.0.0
61 changes: 61 additions & 0 deletions charts/cronjob-etcd-backup/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# cronjob-etcd-backup

Creates a CronJob that creates etcd backups on a schedule and stores them on a PersistentVolume.

This essentially automates the [officially documented etcd backup process](https://docs.openshift.com/container-platform/4.10/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html) with the additional step
that the etcd backup is moved to external storage via a PVC.

> :exclamation: **uses privileged** see [Permissions](#permissions)
## Use

### Manual
Installs and tests the helm chart
```bash
helm upgrade --install cronjob-etcd-backup ./cronjob-etcd-backup --namespace openshift-etcd-backup --create-namespace
helm test cronjob-etcd-backup
```

### ArgoCD
There innumerable different ways and opinions on doing GitOps, and even within ArgoCD there
are many ways. Here is a start if you don't already have an opinion.

```yaml
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: openshift-etcd-backup
spec:
destination:
name: ''
namespace: openshift-etcd-backup
server: 'https://kubernetes.default.svc'
source:
path: charts/cronjob-etcd-backup
repoURL: 'https://github.com/redhat-cop/openshift-management.git'
targetRevision: master
helm:
values: |
pvcStorage: 100Gi
pvcStorageClassName:
cronJobSchedule: '5 0 * * *'
cronJobDaysToKeepPersistentETCDBackups: 5
project: default
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
```
## Permissions
Yes, this chart uses the `privileged` security context, but it is not out of laziness, it is
out of necessity. To be able to run the `cluster-backup.sh` script on a control node you not only
need to be able to mount the host file system but you need to be able to sudo.

While the [officially documented etcd backup process](https://docs.openshift.com/container-platform/4.10/backup_and_restore/control_plane_backup_and_restore/backing-up-etcd.html)
has you manually create a debug pod for a control node to accomplish this, if you are automating this
process then the container created by the CronJob has to have the same permissions a debug pod
for a control node would have. So this is no more permissions then would be used doing this the
documented manual way, its just giving it to the "robot".
18 changes: 18 additions & 0 deletions charts/cronjob-etcd-backup/templates/ClusterRole.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
# NOTE: To be able to run the ETCD backup commands need to be able to sudo,
# hence the need for privileged.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "cronjob-etcd-backup.fullname" . }}
labels:
{{- include "cronjob-etcd-backup.labels" . | nindent 4 }}
rules:
- verbs:
- use
apiGroups:
- security.openshift.io
resources:
- securitycontextconstraints
resourceNames:
- privileged
15 changes: 15 additions & 0 deletions charts/cronjob-etcd-backup/templates/ClusterRoleBinding.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: {{ include "cronjob-etcd-backup.fullname" . }}
labels:
{{- include "cronjob-etcd-backup.labels" . | nindent 4 }}
subjects:
- kind: ServiceAccount
name: {{ include "cronjob-etcd-backup.fullname" . }}
namespace: {{ .Release.Namespace }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: {{ include "cronjob-etcd-backup.fullname" . }}
83 changes: 83 additions & 0 deletions charts/cronjob-etcd-backup/templates/CronJob.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---
kind: CronJob
apiVersion: batch/v1
metadata:
name: {{ include "cronjob-etcd-backup.fullname" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "cronjob-etcd-backup.labels" . | nindent 4 }}
spec:
schedule: "{{ .Values.cronJobSchedule }}"
concurrencyPolicy: Forbid
suspend: false
jobTemplate:
metadata:
labels:
{{- include "cronjob-etcd-backup.labels" . | nindent 8 }}
spec:
backoffLimit: 0
template:
metadata:
labels:
{{- include "cronjob-etcd-backup.labels" . | nindent 12 }}
spec:
nodeSelector:
node-role.kubernetes.io/master: ''
restartPolicy: Never
activeDeadlineSeconds: 500
serviceAccountName: {{ include "cronjob-etcd-backup.fullname" . }}
hostPID: true
hostNetwork: true
enableServiceLinks: true
schedulerName: default-scheduler
terminationGracePeriodSeconds: 30
securityContext: {}
containers:
- resources: {}
terminationMessagePath: /dev/termination-log
name: {{ include "cronjob-etcd-backup.fullname" . }}
command:
- /bin/bash
- '-c'
- >-
echo -e '\n\n---\nCreate etcd backup local to master\n' &&
chroot /host /usr/local/bin/cluster-backup.sh /home/core/backup/ &&
echo -e '\n\n---\nCleanup old local etcd backups\n' &&
chroot /host find /home/core/backup/ -type f -mmin +"2" -delete &&
echo -e '\n\n---\nCopy etcd backup to persistent volume\n' &&
mkdir -pv /mnt/backup/$(date "+%F_%H%M%S") &&
cp -v /host/home/core/backup/* /mnt/backup/$(date "+%F_%H%M%S") &&
echo -e "\n\n---\nDelete persistent ETCD backups older then ${DAYS_TO_KEEP_PERSISTENT_ETCD_BACKUPS} days\n" &&
find /mnt/backup/* -type d -mtime +${DAYS_TO_KEEP_PERSISTENT_ETCD_BACKUPS} -exec rm -rv {} \; &&
echo -e '\n\n---\nList all etc backups\n' &&
ls -al /mnt/backup/*
env:
- name: DAYS_TO_KEEP_PERSISTENT_ETCD_BACKUPS
value: "{{ .Values.cronJobDaysToKeepPersistentETCDBackups }}"
securityContext:
privileged: true
runAsUser: 0
capabilities:
add:
- SYS_CHROOT
imagePullPolicy: Always
volumeMounts:
- name: backup
mountPath: /mnt/backup
- name: host
mountPath: /host
terminationMessagePolicy: File
image: {{ .Values.cronJobImage }}
volumes:
- name: backup
persistentVolumeClaim:
claimName: {{ include "cronjob-etcd-backup.fullname" . }}
- name: host
hostPath:
path: /
type: Directory
dnsPolicy: ClusterFirst
tolerations:
- key: node-role.kubernetes.io/master
successfulJobsHistoryLimit: {{ .Values.cronJobSuccessfulJobsHistoryLimit }}
failedJobsHistoryLimit: {{ .Values.cronJobFailedJobsHistoryLimit }}
1 change: 1 addition & 0 deletions charts/cronjob-etcd-backup/templates/NOTES.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
etcd is now being automatically backed up on schedule: {{ .Values.cronJobSchedule }}
17 changes: 17 additions & 0 deletions charts/cronjob-etcd-backup/templates/PersistentVolumeClaim.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: {{ include "cronjob-etcd-backup.fullname" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "cronjob-etcd-backup.labels" . | nindent 4 }}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: {{ .Values.pvcStorage }}
{{ if .Values.pvcStorageClassName }}
storageClassName: {{ .Values.pvcStorageClassName }}
{{ end }}
8 changes: 8 additions & 0 deletions charts/cronjob-etcd-backup/templates/ServiceAccount.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
kind: ServiceAccount
apiVersion: v1
metadata:
name: {{ include "cronjob-etcd-backup.fullname" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "cronjob-etcd-backup.labels" . | nindent 4 }}
43 changes: 43 additions & 0 deletions charts/cronjob-etcd-backup/templates/_helpers.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
{{/*
Expand the name of the chart.
*/}}
{{- define "cronjob-etcd-backup.name" -}}
{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Create a default fully qualified app name.
We truncate at 63 chars because some Kubernetes name fields are limited to this (by the DNS naming spec).
If release name contains chart name it will be used as a full name.
*/}}
{{- define "cronjob-etcd-backup.fullname" -}}
{{- if .Values.fullnameOverride }}
{{- .Values.fullnameOverride | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- $name := default .Chart.Name .Values.nameOverride }}
{{- if contains $name .Release.Name }}
{{- .Release.Name | trunc 63 | trimSuffix "-" }}
{{- else }}
{{- printf "%s-%s" .Release.Name $name | trunc 63 | trimSuffix "-" }}
{{- end }}
{{- end }}
{{- end }}

{{/*
Create chart name and version as used by the chart label.
*/}}
{{- define "cronjob-etcd-backup.chart" -}}
{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }}
{{- end }}

{{/*
Common labels
*/}}
{{- define "cronjob-etcd-backup.labels" -}}
app.kubernetes.io/name: {{ template "cronjob-etcd-backup.name" . }}
app.kubernetes.io/component: cronjob
app.kubernetes.io/part-of: {{ .Values.partOf }}
app.kubernetes.io/managed-by: {{ .Release.Service }}
app.kubernetes.io/instance: {{ .Release.Name }}
helm.sh/chart: {{ include "cronjob-etcd-backup.chart" . }}
{{- end }}
59 changes: 59 additions & 0 deletions charts/cronjob-etcd-backup/templates/tests/test-cronjob.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@

{{- define "cronjob-etcd-backup.helmTestCronJobServiceAccountName" -}}
{{- printf "helm-test-cronjob-%s" (include "cronjob-etcd-backup.fullname" .) | trunc 63 | trimSuffix "-" }}
{{- end }}

---
apiVersion: v1
kind: Pod
metadata:
name: "{{ include "cronjob-etcd-backup.fullname" . }}-test-cronjob"
labels:
{{- include "cronjob-etcd-backup.labels" . | nindent 4 }}
annotations:
"helm.sh/hook": test
spec:
serviceAccountName: {{ include "cronjob-etcd-backup.helmTestCronJobServiceAccountName" . }}
automountServiceAccountToken: true
containers:
- name: wget
image: registry.redhat.io/openshift4/ose-cli
command:
- /bin/bash
- -ec
- |
NAMESPACE="{{ .Release.Namespace }}"
CRONJOB_NAME="{{ include "cronjob-etcd-backup.fullname" . }}"
JOB_NAME="test-${CRONJOB_NAME}-{{ .Release.Revision }}"
TEST_TIMEOUT="{{ .Values.testCronJobTimeout }}"
echo "Create Test Job from CronJob"
oc create job ${JOB_NAME} --from=cronjob/${CRONJOB_NAME} --namespace ${NAMESPACE}
echo "Wait for Test Job to complete successfully"
oc wait --for=condition=complete job/${JOB_NAME} --namespace ${NAMESPACE} --timeout ${TEST_TIMEOUT}
restartPolicy: Never

---
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "cronjob-etcd-backup.helmTestCronJobServiceAccountName" . }}
labels:
{{- include "cronjob-etcd-backup.labels" . | nindent 4 }}

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ include "cronjob-etcd-backup.helmTestCronJobServiceAccountName" . }}
labels:
{{- include "cronjob-etcd-backup.labels" . | nindent 4 }}
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: edit
subjects:
- kind: ServiceAccount
name: {{ include "cronjob-etcd-backup.helmTestCronJobServiceAccountName" . }}
namespace: {{ .Release.Namespace }}
14 changes: 14 additions & 0 deletions charts/cronjob-etcd-backup/values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
partOf: cluster-operations

pvcStorage: 100Gi
pvcStorageClassName:

cronJobSchedule: '5 0 * * *'
cronJobSuccessfulJobsHistoryLimit: 5
cronJobFailedJobsHistoryLimit: 5
cronJobImage: registry.redhat.io/openshift4/ose-cli

cronJobDaysToKeepPersistentETCDBackups: 5

testCronJobTimeout: 120s

0 comments on commit 82659b3

Please sign in to comment.